Gem #129 : Type-Safe Database API - Part 1

Let's get started...

SQL is the traditional language used to query a database management system (DBMS). This language is mostly standardized, even though each vendor provides their own extensions and restrictions. It is convenient when the data could be organized into tables and fields following the relational model. More recently, we have seen the emergence of so-called "noSQL" databases, which use a different paradigm altogether to provide enhanced performance in exchange for additional restrictions. We will not discuss noSQL databases in this Gem.

Despite all its advantages, there are lots of limitations with SQL as it is traditionally used.

* The query is written as a string.

With a string it isn't possible to check at compilation time whether the syntax is correct. It's also not possible to check whether the tables and fields referenced are valid. Furthermore, it isn't possible to have type checking (ensuring comparison of integers with integers, etc.).

* Sending the query to the DBMS is vendor specific.

Each vendor provides a C API, but they all differ from one another. The concepts are in general similar: preparing the query; sending it to the server; passing the parameters; fetching the results; and so on. A standard API exists (ODBC) which is implemented by most, but not all, vendors. However, there is a performance penalty compared to using the native API in general. The setup is also more complex.

Some vendors provide preprocessor-based solutions: you write the SQL query directly in your source code, which is then preprocessed before being passed to the compiler. However, this solution means that the compiler is not seeing exactly the code you have written, which for instance can result in error messages pointing to the wrong line in the file.

The GNAT Components Collection proposes two packages, GNATCOLL.SQL and GNATCOLL.SQL.Exec, which provide solutions to the problems highlighted above. One of the requirements is that these packages should be usable with existing code to provide an easier transition.

We'll begin by addressing the first limitation (SQL as strings). The package GNATCOLL.SQL provides an Ada API to write SQL queries. However, for it to provide type safety, we first need to describe the schema of our database. For this, GNATColl relies on a command-line tool called gnatcoll_db2ada. If you already have an existing and running database, you can simply run the following command:

gnatcoll_db2ada -dbname <name> -dbhost <host> -dbuser <user>

This command will generate two Ada packages (by default, they are called Database and Database_Names). Only the former is meant to be used directly in your applications. It contains Ada packages that match the database schema, for instance:

package Database is
   type T_Table1 (...) is record
      Field1 : SQL_Field_Integer (...);
      Field2 : SQL_Field_Text (...);
   end record;

   type T_Table2 (...) is record
      Field3 : SQL_Field_Integer (...);
   end T_Table2;

   Table1 : T_Table1 (null);
   Table2 : T_Table2 (null);

   ...
end Database;

The actual discriminants are irrelevant in our discussion of the basics of GNATCOLL.SQL. They are used when you need to have multiple references to the same table (through different aliases) in the same query.

The names T_Table1, Field1, Field2, etc. are the actual names of the tables and fields in the schema, so should be familiar to the developers.

If you don't have an existing database, it's possible to describe the schema in a text file. This file can be created initially from an existing database. Here is an example of such a file. Full documentation for the format is available in the GNATColl documentation.

| TABLE  | Table1        |          | |
| Field1 | AUTOINCREMENT | PK       | | Documentation for this field
| Field2 | Text          | NOT NULL | | Documentation for field2

| TABLE  | Table2     | | |
| Field3 | FK(Table1) | | | This is a foreign key

Once the Ada packages have been generated, the user can write SQL queries by using the Ada API provided in GNATCOLL.SQL. Here is a small example:

with GNATCOLL.SQL;   use GNATCOLL.SQL;
with Database;       use Database;   --  the generated package

...

declare
   Q : SQL_Query;
begin
   Q := SQL_Select
      (Fields => Table1.Field2 & Table2.Field3,   --  line 8
       From   => Table1 & Table2,                 --  line 9
       Where  => Table1.Field1 = Table2.Field3);  --  line 10

   --  Converting Q to a string via To_String will now display:
   --     "SELECT table1.field2, table2.field3 FROM table1, table2
   --      WHERE table1.field1 = table2.field3);"
end;

Although it might appear to be longer to type than when using the string, this code already has several advantages:

* It is pure Ada.

There is no need for preprocessing, so any error message the compiler reports will point to the correct line of code.

* It guarantees that the SQL is syntactically correct.

By design, the API prevents you from entering a syntactically incorrect query (for instance, a typo like "SELCET", or missing spaces, or any other kind of error). The API makes sure you can only specify a table or a list of tables in the FROM clause, and there's a similar limitation for the FIELDS clause.

* It guarantees that only valid tables and fields are referenced.

Since the query is written using the generated API, this ensures that if you change the schema of your database, the generated Database package will also change, and the compiler will then flag queries that reference fields that no longer exist. When the query is written as a string, changing the schema requires searching in the source for possible impacts, rather than relying on the compiler to do this work on our behalf.

* It checks types at compilation time.

If we modify line 10 to compare Table1.Field2 and Table2.Field3, the compiler will complain that we are comparing an integer and a text field.

Because the queries are built as a tree which is then flattened into a string, GNATCOLL.SQL is also able to provide automatic completion on queries. For instance, we could omit line 9 above, and add:

  Auto_Complete (Q);

In this case there is no gain, and we are actually losing in readability and efficiency. But this auto-completion capability is especially useful for completing the GROUP BY clauses. For instance, when one of the fields returned by the query is the result of an aggregation function, and you also return ten other fields, the auto-completion removes the need to manually maintain the list of nine fields in GROUP BY.

Of course, there is a performance penalty implied by the creation of the query in memory, and then its serialization as a string. To alleviate that, GNATCOLL.SQL provides support for prepared queries (client or server side), as well as parameterized queries. The second Gem in this series goes into more detail on this, and also describes the package GNATCOLL.SQL.Exec, which abstracts communication with the DBMS engine to a vendor-agnostic API.

To be continued in part 2 ...