Gem #157: Gprbuild and Code Generation

This series follows on from Gems 152 and 155, which we recommend reading first as an introduction.

Let's get started...

Gprbuild was introduced by AdaCore as a replacement for gnatmake. The goal is still to make it easy to build a whole application. However, in contrast to gnatmake, which is limited to Ada, gprbuild is a multilanguage tool, and can happily launch compilers for the various languages you use in your application, and then link all resulting object files together to create a final executable.

Gprbuild's work is described via a project file (which uses the .gpr extension). In contrast to Unix 'make' (a traditional tool used to drive the program build process), a project file gives a static description of a project. You do not have to describe the exact commands to spawn to recompile the files, nor do you need to describe the dependencies of your sources, nor specify when an object file has to be recreated because some of the sources it depends on have changed.

Instead, gprbuild itself has knowledge of various compilable languages. For instance, it knows that for Ada an object file (extension .o) is generated by GNAT from a set of source files having the same base name but a different extension (usually .ads or adb). It also knows that a source file might depend on other source files, and when any of those change, the object file also needs to be regenerated.

Thanks to this built-in knowledge, the project file for a typical pure Ada application is much simpler than the equivalent Makefile would be (unless of course you are calling gnatmake or gprbuild from that Makefile). You only need to point it to the source files, and the rest is automatic. Similar support is available for C and Fortran.

However, nowadays many applications need to first generate part of their sources from higher-level languages, such as UML or Simulink. Fortunately, gprbuild is relatively easy to extend to other languages, and this Gem describes the various steps required to accomplish that.

A custom code generator

Let's first start with a description of the problem.

Let's assume we have an in-house code generator that reads information from one or more XML files and then generates one or more Ada files from those. These generated Ada files should then be compiled together with hand-coded Ada files before the final executable can be linked.

Of course, for optimization purposes, we would like to do the minimal amount of recompilation when no XML file has been modified (that is, not regenerate the Ada files), or when no Ada file has been modified (although that part is already automatically handled by gprbuild).

This example would apply similarly when using code generators such as Lex or Yacc, for instance.

Describing the code generator to gprbuild

Of course, since this is a custom code generator, gprbuild knows nothing about it by default, and we need to set things up. This can be done directly in a project file, but gprbuild provides flexibility beyond that.

Gprbuild itself has no hard-coded knowledge about compilation languages. Instead, it reads all the information it needs from a configuration file (usually with the extension .cgpr). The configuration file is not written by users, but is generated from a set of XML files (the "knowledge base") via a second tool called gprconfig. The high-level behavior is as follows:

     XML files (knowledge base)
                |
            gprconfig
                |
                V
            auto.cgpr       user project (default.gpr)
                |                       |
                \_______________________/
                            |
                        gprbuild
                            |      
                            V
            commands to execute for the build

What we need to do is create a new XML file for the knowledge base.

Let's keep the XML language for pure XML files, in case the application contains some that are unrelated to code generation. Instead, we will "invent" a new language, tentatively named "xml_for_ada". Gprbuild needs to find the sources for this language automatically (they have a standard .xml extension).

The XML file would look like the following:


<gprconfig>
   <compiler_description>
      <name>XML_For_Ada</name>
      <executable>codegen</executable>
      <languages>xml_for_ada</languages>
      <version>1.0</version>
   </compiler_description>
   <configuration>
      <compilers>
         <compiler language="xml_for_ada">
      </compilers>
      <config>
    package Naming is
       -- How to recognize XML files
       for Body_Suffix ("xml_for_ada") use ".xml";
    end Naming;
    package Compiler is
       -- describes our code generation, from XML to Ada
       for Driver ("xml_for_ada") use "codegen";
       for Object_File_Suffix ("xml_for_ada") use ".ads";
       for Object_File_Switches ("xml_for_ada") use ("-o", "");
       for Required_Switches ("xml_for_ada") use ("-g");
         -- always use this switch
       for Dependency_Switches ("xml_for_ada") use ("-M");
         -- -M file.d (indicates the dependency file)
    end Compiler;
      </config>
   </configuration>
</gprconfig>

The first part ("compiler_description") describes how to find the executable for the code generator. Its name is "codegen", and it only needs to be located if the user's project indicates that it uses the "xml_for_ada" language. The version number is hard-coded for now, but it would be possible to ask codegen itself for its current version number, which could be used later to change the gprbuild support for it depending on the version.

Assuming a code generator is found, the second part of the XML file ("configuration") indicates what code needs to be added to the gprbuild configuration file. This is where the magic of gprbuild happens.

First, we're letting gprbuild know how to recognize our XML files (".xml" extension).

Second, we're telling it that to process those files, it needs to call an executable named "codegen" that will generate Ada files with the extension ".ads".

For proper handling of dependencies (which will minimize recompilation), gprbuild needs to know about the name of the generated file. For us this is a file with a ".ads" extension (although we could of course generate additional files). When multiple XML files are needed for a single Ada file, the code generator should generate a dependency file (basically similar to a Makefile extract) that indicates the list of those files. In our case, we decided that this dependency file name will be passed to the code generator via the -M switch.

Finally, we can decide that some switches are mandatory for the code generator, and we show an example with the -g switch.

Sample code for a very simple code generator is shown below:

with Ada.Text_IO; use Ada.Text_IO;
with GNAT.Command_Line; use GNAT.Command_Line;
procedure Codegen is
   F : File_Type;
begin
   loop
      case Getopt ("g M: o:") is
         when 'g' => -- some random switch passed to the compiler
            null;
         when 'M' => -- We need to generate the dependency file.
            Create (F, Out_File, Parameter);
            Put_Line (F, "b.ads: ../src/b.xml");
            Close (F);
         when 'o' => -- Name of the object file
            -- We need to parse the XML file and generate code.
            -- Let's simulate it.
            Create (F, Out_File, Parameter);
            Put_Line (F, "package B is");
            Put_Line (F, "   procedure Foo is null;");
            Put_Line (F, "end B;");
            Close (F);
         when others =>
            exit;
      end case;
   end loop;
end Codegen;

The user project

The hard part is now done, and we can move on to writing a project that uses this code generator. Since we provided that information in a general manner to gprbuild, we can have multiple such projects without duplicating the work above.

The setup is the following:

gprconfig_db/

Contains the XML file we created above for gprbuild

src/

This directory should contain the .xml files used for code generation, as well as hand-coded Ada files. Let's assume it contains b.xml and a.adb

generated/

This directory will contain the Ada files generated from the XML files.

obj/

This directory will contain the object files resulting from the compilation of the Ada files.

Gprbuild needs to know all of its sources when it starts, so in practice we will need to make two runs of gprbuild: one to generate the Ada from XML and the second to compile all the Ada files and link the executable. That means that the set of source files is different in the two steps: in the first step, the sources are XML files, and the "object files" are the resulting Ada files; in the second step, the sources are the Ada files, and the object files are the usual .o files.

We could implement this setup with two different project files, one for each step. However, my own preferred approach is to use a single project file with a scenario variable that indicates the current step.

Here is the project file:

project Default is
   type Compilation_Step is ("Step_1", "Step_2");
   Step : Compilation_Step := External ("STEP", "Step_1");
   case Step is
      when "Step_1" =>
         for Languages use ("xml_for_ada");
         for Source_Dirs use ("src");
         for Object_Dir use "generated";
      when "Step_2" =>
         for Languages use ("Ada");
         for Main use ("a.adb");
         for Source_Dirs use ("src", "generated");
         for Object_Dir use "obj";
   end case;
end Default;

After all this setup, the compilation itself is done with these two simple commands:

   gprbuild --db gprconfig_db -Pdefault -XSTEP=Step_1
   gprbuild -Pdefault -XSTEP=Step_2

The first time they are run, the output is:

   > gprbuild --db gprconfig_db -Pdefault -XSTEP=Step_1
   codegen -g -Mb.d b.xml -o b.ads
   > gprbuild -Pdefault -XSTEP=Step_2
   gcc -c a.adb
   gcc -c b.ads
   gprbind a.bexch
   gnatbind a.ali
   gcc -c b__a.adb
   gcc a.o -o a

The second time (if no file was modified), we get the expected:

   > gprbuild -Pdefault -XSTEP=Step_2
   gprbuild: "a" up to date

If we now modify b.xml, and b.ads is regenerated and then recompiled, as in the first step.

One more trick: we could in fact describe a third step (Step_3), that would be the default value for the external variable STEP. In that third step, the project would contain the languages for both xml_for_ada and Ada. The use of that artificial third step would be to load that project directly in GPS, conveniently allowing editing of both XML and Ada files.