Writing a Code Generator in Java
Design and implement a flexible, Java code-generation system that is easy to maintain, intuitive to work with, and based on IOM
by Giuseppe Naccarato
Posted March 3, 2004
Editor's Note: This is the first of two installments on using Java to write a code generator. This first installment defines a code generator, lists its benefits, and begins providing guidelines for designing and implementing a generic code generator. The article also looks at the code generator's architecture and Internal Object Model. The second installment concludes with a discussion of the importer and exporter interfaces, manual implementations, and using templates.
A code generator is a program that has a model as input and produces output source code, which implements that model. The model consists of a set of metadata containing information about the code that will be generated. A model can be a Unified Modeling Language (UML) design, a proprietary descriptor file, or even other source code. The model format can be XML, plain text, CSV, or other kinds of sources such as directories, databases, or repositories. Starting from the information contained in the model, a code generator creates source code, which belongs to a programming language (C, C++, Java, C#, VB, and so on) or to another type of output: documentation, descriptors, configuration files, SQL code, and so on.
Generating code brings a number of benefits. When applied to the right context, code generation increases productivity; in fact, generating code should be faster than writing it by hand. Generated code is usually consistent; a program that writes code, unlike an inattentive programmer, always follows the coding standard. Therefore, class, method, attribute, and parameter names should always be consistent. A further benefit of code generation is related to abstraction. Basically, the code is the implementation of an abstract model, and therefore the system is not bound to a specific target-technology platform. Finally, you may have concerns about the quality of the generated code. Don't be afraid. Since programmers write generators, the quality of the generated code should be at least as good as what the same programmers can write manually. In addition, using templates, as you will see, assures a good quality and easily maintainable generated code.
The focus here is to provide guidelines about how to design and implement a generic, flexible, and effective code generator in Java. Although the concepts will be generic and usable for different kinds of input and output, the examples show how to translate a simplified UML class diagram into Java code. For space reasons, some listings will be partial. However, you can follow this link to download the complete implementation of the code generator.
Approaches
We can classify a code generator according to the type of input. The two major approaches are code driven and model driven.
A code-driven generator takes as input a file containing source code and special tags, which drive the code generation process. JavaDoc is an example of such a generator. In fact, it takes as input Java source code containing special comments and generates HTML code representing the documentation of that code. Another example of a code-driven generator is XDoclet (see Resources). It is a very popular code generator for Java with the primary goal of generating EJB code, starting from particular JavaDoc tags within the entity beans code.
Model-driven generators can also subclassified into two types: custom and MDA. A custom generator takes as input a proprietary model representing the information that must be converted in source code. Using XSLT or template engines, such as Apache Velocity, it is quite easy to generate code according to the metadata coming from the model.
When the model as input is a representation of UML, then the code generator follows the Model-Driven Architecture (MDA). MDA is an Object Management Group (OMG) initiative to create a standard for code generation. An MDA code generator takes as input a platform-independent model (usually XMI, an XML representation of UML) and turns it into a platform-specific model that, by means of templates, can be converted easily in source code (see Resources).
Architecture
The code generator designed partially here follows MDA. I say "partially" because being very flexible it could also be used for code-driven and custom, model-driven approaches. The code generator consists of three different modules:
-
Importer – reads the model as input (not necessarily UML) and translates it in a platform-independent internal format based on an object model. An importer could be seen as a sort of DOM parser. But, as you will see, it could be implemented for working with any kind of input you want.
-
Internal Object Model (IOM) – is the platform-independent internal format and could be considered as the core of the code-generator architecture. IOM contains a set of classes that make it easy to manipulate the information coming from the model to generate outputs. The structure of the object model is a very important issue—when well designed it can be a powerful target-technology, independent representation that can be converted easily in source code.
-
Exporter – accesses the IOM and takes the relevant information to generate code. It could use templates, which drive the generation process.
Figure 1 diagrams the architecture of the proposed code generator. I have designed this architecture with these benefits in mind:
- The input is completely independent of the output in terms of technologies. For instance, you can create an importer that reads a UML diagram and then an exporter writing C#-ADO code and another one writing Java-JDBC code, but both work with the same UML model as input.
- If the IOM is implemented according to UML rules, you can represent everything you can do with UML, which means that every kind of input can be represented and converted into a real object-oriented design. It is a big advantage in writing importers and exporters manipulating the model.
- By implementing multiple exporters and applying a simple pattern, you can create different layers of code—source code, descriptors, documentation, scripts, and so on—with a single pass that assures you consistency and synchronization among the different outputs.
I believe that such an ambitious code generator can be implemented effectively using a modern imperative programming language like Java, C#, C++, Perl, or Python. It would be too much complexity to write it by using other technologies like transformations or template languages. However, it is also true that using XSLT or Velocity Template Engine for specific duties can be faster and more effective, especially when you just want to generate code covering simple aspects of your project.
I'm proposing an architecture for a code generator on which you can base your whole project or a major part of it. It is very flexible, easy to maintain, intuitive to work with, and based firmly on a powerful internal object model. The last point is the major lack of transformation and template-only approach: the internal model is often poor and rigid or, in same cases, even missing.
Moreover, using an imperative language you can use powerful features and libraries when implementing importers, an internal model, and exporters. For example, by using Java, you can take advantage of aspects such as interfaces, reflection, and exception handling that will make the code generator very extensible, reliable, and robust.
Internal Object Model
According to the flow of the proposed architecture, we should start with the importer. However, it would be very tough to explain and show a real example of an importer implementation when the IOM is still unknown because the importer creates and uses the IOM classes. Therefore, let's have a look first at the IOM.
The internal representation of the code generator is the most important part, and the one that should never change. What can be imported and exported depends on a proper design of this module. A good strategy for designing an effective IOM is borrowing UML concepts. Basically, the internal model can be an object-oriented design implementation of the structure of a UML model. Therefore, you can model a set of classes and associations representing UML classes, attributes, operations, parameters, and associations. It might sound a little bit weird, but what we are going to do is design UML features and rules by using UML itself.
As a convention, IOM classes start with the IOM prefix. Figure 2 shows a class diagram of a very simplified IOM. The IOMClass class models a UML class. It has an aggregation with the IOMAttribute and IOMOperation classes because classes have attributes and operations. Since operations have parameters, an aggregation is modeled between the IOMOperation and the IOMParameter classes. The IOMAssociation class represents a UML association. It has two roles (start and end) that are modeled by the IOMRole class. Then, that last class has an association with the IOMClass class (involved), which represents the start or the end class of the UML association.
Although this design is very simple, it will give you an idea about how to implement a fully functional IOM. In a real context, each class should provide specific attributes that somehow will drive the code generation. For example, for the IOMClass class you specify these attributes: name; stereotype; visibility; type (abstract, transient, persistent); and documentation. In addition, important aspects such as base classes, derived classes, exception classes, and packages must be supported by adding new classes and associations to the model.
Listing 1 provides a basic implementation of the IOMClass class. It has just two attributes: name (the name of the class) and stereotype (the UML associated stereotype). The aggregations with the IOMAttribute and IOMOperation classes are implemented by two ArrayList (attributes and operations) and related get/add methods: getAttributes(), addAttribute(), getOperations(), and addOperation(). The class also contains the getMyAssociations() method that returns all the IOMAssociations in which the class is involved as a start or end role. Using a similar approach, you can also implement all the other IOM classes.
The root class of the model is IOMController, which is implemented in Listing 2. It contains and provides the possibility of access to the instances of IOMClass and IOMAssociation involved in the model. In addition, the IOMController class implements the queryClass() method that may be useful to search a class by name. Importers and exporters start from that class respectively to create the IOM and to read metadata from it to write the output.
The final installment will look specifically at the importer and exporter interfaces, and then conclude with the use of multiple exporters, manual implementation, and templates.
About the Author
Giuseppe Naccarato has a degree in computer science and works as software developer for an IT company based in Glasgow (UK). His main interests are J2EE- and .Net-related technologies. Contact Giuseppe at .
|