
Formally Verified Hashed Sets in Ada SPARK
For a few years now, I have been working on verifying an implementation of bounded hashed sets in Ada SPARK as a side project. I have been working on it for a few weeks whenever I have the opportunity, and now it is advanced enough to present here.
Since GNATprove itself is not formally verified, we SPARK developers do not necessarily have the opportunity to use the tool to verify real-life software. We generally work on small, self-contained examples that exercise the specific features of GNATprove we are working on. I am always looking for bigger examples to work on, to have the opportunity to see several features working together, and to identify what could be missing in the tool. Also, it is an opportunity to assess proof scalability, which we know can be a problem. Finally, while I like working on GNATprove, my real interest is formal verification, so getting an occasion to use the tool is fun!
Since I wanted to spend some time verifying a real-life piece of software, I thought it would be even better if it were something for which formal verification would be useful. A part of SPARKlib that we ship with SPARK seems like a good idea. It is code that will be used in a critical context and has non-trivial contracts. Unfortunately, most of the SPARKlib uses non-SPARK constructs in its implementation, in particular pointers that do not adhere to SPARK's ownership policy.
Bounded formal containers are an interesting topic, and they are implemented using indexes within an array rather than dynamically allocated memory, which is compatible with SPARK. There are various container types in the SPARKlib: lists, vectors, and hashed and ordered sets and maps. I didn’t want to take something too simple, as I like a challenge, but ordered containers, which are based on red-black trees, looked like too much work for a side project, so I decided to go with hashed sets. A nice plus is that they are part of the SPARKlib fragment we certify, so formal proofs can give us additional confidence in their correctness.
I have not chosen to prove the actual implementation of the formal hashed sets. First, it is a generic unit, and we can only verify instances, and second, it is far easier to prove programs when you can modify the implementation. So instead, I have started from scratch and reimplemented hashed sets of integers, specifically, though the type can easily be replaced. The equivalence relation and hash functions are kept abstract, using lemmas for their properties, so the proof is valid for any possible implementation of these functions. I have kept everything as close as possible to the actual implementation in the SPARKlib.
The type used for the bounded hashed sets in the SPARKlib is as follows:
type Node_Type is record
Element : aliased Element_Type;
Next : Count_Type;
Has_Element : Boolean := False;
end record;
type Nodes_Type is array (Count_Type range <>) of Node_Type;
type Buckets_Type is array (Hash_Type range <>) of Count_Type;
type Hash_Table_Type
(Capacity : Count_Type;
Modulus : Hash_Type)
is record
Length : Count_Type := 0;
Free : Count_Type'Base := -1;
Nodes : Nodes_Type (1 .. Capacity);
Buckets : Buckets_Type (1 .. Modulus) := [others => 0];
end record;The elements are stored in an array Nodes. A hash function is used to choose a bucket for each element of the set. A Buckets array stores, for each value of a hash, the head of a list containing all elements in the set with the same hash. These lists are implemented through a Next field in the memory array Nodes. The indexes that are not currently used for the set are linked together in a free list starting at Free.

In addition, the boolean flag Has_Element is set to True if and only if the node is currently used and the Length field maintains the number of allocated elements in the set. There is a small optimization for the Free list: it can have a negative value to represent a contiguous free list starting at a given index and extending until the end of the array. It avoids initializing the free list if we never remove elements from the set.
The implementation that we have verified is very similar:
type Relaxed_Element is record
V : Element_Type;
end record
with Relaxed_Initialization;
type Node_Type is record
Element : Relaxed_Element;
Next : Count_Type := 0;
Has_Element : Boolean := False;
end record;
type Nodes_Type_Base is
array (Positive_Count_Type range <>) of Node_Type;
subtype Nodes_Type is Nodes_Type_Base (1 .. <>)
with
Predicate =>
Nodes_Type'Last >= 0
and then (for all C of Nodes_Type => C.Next <= Nodes_Type'Last);
type Buckets_Type is array (Hash_Type range 1 .. <>) of Count_Type;
type Set
(Capacity : Count_Type;
Modulus : Positive_Hash_Type)
is record
Length : Count_Type := 0;
Free : Count_Type'Base := -1;
Nodes : Nodes_Type (1 .. Capacity);
Buckets : Buckets_Type (1 .. Modulus) := (others => 0);
end record
with
Predicate =>
Length in 0 .. Capacity
and then
(if Capacity = 0 then Free = -1 else Free in -Capacity .. Capacity)
and then (for all B of Set.Buckets => B <= Capacity),
Type_Invariant => Invariant (Set);There are mainly two differences. We introduce a record type, Relaxed_Element, as a wrapper over elements in the memory. It has no effect on the program but instructs GNATprove that elements of the set might not be initialized. The Relaxed_Initialization aspect has been explained in a previous blog post if you want to know more. We have also added contracts for the types (predicates and invariants) to enable verification. The predicates, which need to hold all the time for all initialized values in SPARK, are used to give bounds for the various components. The validity of the data structure is fairly complex, as it requires reasoning about the underlying list structures, buckets, etc. It is given as an invariant, so it only needs to be maintained on the boundary of the enclosing package. We don’t give the definition of the invariant here. Similarly, the implementation of the various features is kept as close as possible to that used in the SPARKlib, apart from annotations and ghost code necessary for proof.
The contracts for these subprograms are also directly taken from the SPARKlib and are therefore highly non-trivial. They use several model functions for their specification. A ghost function called Model returns a functional set as a high-level view of the data structure. Unfortunately, this model is not sufficient to verify subprograms that use cursors to iterate over a formal hashed set, as it does not represent cursors or the order of element occurrences during an iteration over a set. Therefore, two additional model functions are defined: the Elements function returns a sequence of elements representing their order in the container, and the Positions function returns a functional map that associates the valid cursors in a container with an integer representing their position in this sequence.
This layered approach allows users of the formal containers library to choose the level of granularity they need.
function Model (Container : Set) return M.Set;
-- The high-level model of a set is a set of elements. Neither cursors
-- nor order of elements are represented in this model. Elements are
-- modeled up to equivalence.
function Elements (Container : Set) return E.Sequence;
-- The Elements sequence represents the underlying list structure of
-- sets that is used for iteration. It stores the actual values of
-- elements in the set. It does not model cursors.
function Positions (Container : Set) return P.Map;
-- The Positions map is used to model cursors. It only contains valid
-- cursors and maps them to their position in the container.Set operations are specified in terms of these models. For example, here is a slightly simplified version of the declaration of the Find function that searches for an element in the set:
function Find (Container : Set; Item : Element_Type) return Cursor
with
Global => null,
Contract_Cases =>
(SPARKlib_Full =>
-- If Item is not contained in Container, Find returns No_Element
(not Contains (Model (Container), Item) => Find'Result = No_Element,
-- Otherwise, Find returns a valid cursor in Container
others =>
P.Has_Key (Positions (Container), Find'Result)
-- The element designated by the result of Find is Item
and Equivalent_Elements (Get (Elements (Container), P.Get (Positions (Container), Find'Result)), Item)));It uses the three model functions: Model to decide if the element is in the set, and then, if it is, Positions and Elements to state that the return cursor designates an element equivalent to Item in Container. You can look at the specification of the formal hashed sets in the SPARKlib if you are interested.
The proven reimplementation of the formal hashed sets has been included in the SPARK testsuite. It does not contain all the subprograms in SPARKlib (yet), but it features primitives for iteration, as well as for searching and modifying sets. We don’t have specific set operations like union or intersection, but they shouldn't be too hard to implement, since they rely on the insertion primitive. It is entirely verified by the SPARK tool at level 4 on my computer with the current wavefront of SPARK, except for a spurious check that states that sets are not fully initialized by default. This is due to a current (known) limitation in the handling of types with components annotated with the Relaxed_Initialization aspect.
Let’s come back to the beginning of this post and our reasons for writing this example. During verification, we found two issues in the SPARKlib that we have corrected there. The first one is a corner case: we should not allow a modulus of 0, or the set implementation won’t work. The second one is more interesting: when copying the contracts of the Replace_Element procedure, we noticed that it did not allow us to prove its implementation. And indeed, we had forgotten a part of the necessary precondition, and an exception could be propagated if replacing an element could lead to a duplication in the set! Unfortunately (or fortunately), this very error was also found by one of my colleagues working on the qualification material for the SPARKlib at approximately the same time. This has, of course, been fixed in the library.
Regarding testing the tool on a real project and assessing its usability, the results are positive: the library has been fully verified. However, getting the proofs to go through has required significant effort, necessitating manual guidance of the proof tool, abstraction and refinement, and some manual handling of the proof context to keep the verification process tractable. This is inherent to the complexity of the example, which in particular requires reasoning by induction, something the solvers are notoriously bad at. If you are interested, part of this work was used as an example in the article "Containers for Specification in SPARK," published in the International Journal on Software Tools for Technology Transfer last year. I will also write some more blog posts in the near future to present some of these, so stay tuned!
Author
Claire Dross

Claire Dross has a PhD in deductive verification of programs using a satisfiability modulo theory solver with the Universite Paris-Sud. She also has an engineering degree from the Ecole Polytechnique and an engineering degree from the Ecole Nationale Superieure des Telecommunications. At AdaCore, she works full-time on the formal verification SPARK 2014 toolset.





