Gem #52: Scripting Capabilities in GNAT (Part 1)

Let's get started…


The first thing a program generally has to do is to parse its command line to find out what features a user wants to activate. Standard Ada provides the Ada.Command_Line package, which basically gives you access to each of the arguments on the command line. But GNAT provides a much more advanced package, GNAT.Command_Line, which helps to manipulate those arguments, so that you can easily extract switches, their (possibly optional) arguments and any remaining arguments. Here is a short example of using that package:

with GNAT.Command_Line;  use GNAT.Command_Line;
procedure Main is
begin
  loop
     case Getopt ("h f: d? e g") is
       when 'h' => Display_Help;
       when 'f' => Set_File (Parameter);
       when 'd' => Set_Directory (Parameter);
       when 'e' | 'g' => Do_Something (Full_Switch);
       when others => exit;
     end case;
  end loop;
exception
  when Invalid_Switch | Invalid_Parameter =>
     Display_Help;
end Main;

This application accepts several switches. -f requires an extra argument, whereas -d has an optional argument. The application can be called in several ways: for example, "-ffile -e -g", "-f file -e -g", or "-f file -eg". GNAT.Command_Line is extremely flexible in what it accepts from the user, which can help to make the application easier to use.

One other convenient package is GNAT.Regpat, which provides the power of regular expression processing in Ada programs. This package was already discussed in a previous Gem related to searching, so we will not detail it again here (see Gem #25).

It is often the case that scripts need to parse text files. Although the standard package Ada.Text_IO gives access to such files, it has several drawbacks. First, it is rather slow. In fact, the Ada standard forces this package to manage a number of additional items of information internally, which can add considerable overhead. Also, the file is read chunk by chunk, which on most systems is slow (and requires lots of blocking system calls). Where possible, it is better to use more efficient standard packages such as Stream_IO. Furthermore, Ada.Text_IO doesn't provide any parsing capability.

The GNAT Reusable Components (initially released to customers in July 2008) provide the package GNATCOLL.Mmap. This package typically uses more efficient system calls to read the file, and results in much greater speed if your application is reading lots of text files.

with GNATCOLL.Mmap;  use GNATCOLL.Mmap;
procedure Main is
   File : Mapped_File := Open_Read ("filename");
   Str  : Str_Access;
begin
   Read (File);  --  read whole file at once
   Str := Data (File);
   --  you are now manipulating an in-memory version of the file
   Close (File);
end Main;

This package also provides support for only reading part of a file at once in memory, which is important if you are manipulating huge files. But most often it is more efficient simply to read the whole file at once.

Regarding parsing facilities, another package comes to the programmer's aid: GNAT.AWK. This package provides an interface similar to the standard awk utility on Unix systems. Specifically, one of its modes is an implicit loop over the whole file. Pattern matching is applied to each line in turn, and when the pattern matches, a callback action is taken. This means you no longer have to do all the parsing yourself. Several types of iterators are provided, and this Gem illustrates only one. See the interface of g-awk.ads for more examples.

with GNAT.AWK;  use GNAT.AWK;
procedure Main is
   procedure Action1 is
   begin
      Put_Line (Field (2));
   end Action2;

   procedure Default_Action is
   begin
      null;
   end Default_Action;
begin
   Register (1, "saturn|earth", Action1'Access);
   Register (1, ".*", Default_Action'Access);
   Parse (";", "filename");
end Main;

Each line in the file denoted by "filename" contains groups of fields with semicolons used to separate the groups. Actions are registered for different values of the first field in a group. When the first field matches "saturn" or "earth", the appropriate action is called, and the second field is printed. For all other planets nothing is done. This code is certainly bigger than the equivalent Unix shell command using awk, but is still very reasonably sized. Of course, if you make a mistake in your program, it is very likely that the compiler will help you find it (ever tried to debug a 20-line awk program, not to mention even longer programs, where quotes rapidly become an issue?)

An upcoming Gem will discuss additional scripting features, focusing on manipulation of external processes.