
Reasoning about Linked Structures in an Array
In an embedded context, it is common to use bounded data structures implemented as arrays rather than dynamically allocating memory. This technique is used for bounded containers in the Ada Standard Library and in SPARKlib. This implementation respects the additional SPARK restrictions and can therefore be verified, as explained in a previous blog post. However, reasoning about linked data structures within an array is complicated, requiring high-level concepts such as reachability and inductive reasoning. In this article, I will explain how I have defined these concepts to verify the implementation of formal hashed sets. As this kind of reasoning is relatively common for bounded data structures, I have factored out these functionalities in a new package, SPARK.Higer_Order.Reachability in the SPARKlib. Don’t hesitate to reuse it for your own data structures!
Acyclicity, Reachability, and Model Sequence
For these explanations, consider that we have a data structure that uses an array to represent a chunk of memory. In this array, we represent one or more singly linked structures: for each cell, the Next function returns the index of the next cell. In most cases, next would return the value of a field stored in the Cell. It is what is done for the hashed sets implementation, for example. The memory is valid if, for every cell in the array, the next element is either an index in the array or a special value No_Index that represents the end of the list:
type Memory_Type is array (Index_Type range <>) of Cell_Type;
function Next (C : Cell_Type) return Index_Type'Base;
function Valid_Memory (M : Memory_Type) return Boolean
is (M'First = Index_Type'First
and then (for all C of M => Next (C) in M'Range | No_Index));For now, let’s concentrate on the properties we would like to express on these structures, without worrying about their implementation. First, we have decided to deal only with acyclic lists. It is not necessarily the case that the data structures in arrays are acyclic, but it greatly simplifies reasoning, as it means that traversing the structure using Next necessarily terminates. We declare a function Is_Acyclic that is defined recursively: if X is No_Index, it is acyclic; otherwise, it is acyclic if and only if the next index of the cell at index X in memory M is acyclic. We give its definition recursively as a separate lemma so it can be unfolded manually without risking a “matching loop” - i.e., the solver spending a lot of time considering deeper and deeper structures. When the proofs are not too complex, the tool can automatically instantiate this lemma, making it appear as if its definition were stated directly as a postcondition. We don’t use that in general for our hashed sets, though, as it is too complex and confuses the provers. Note that this definition is not an adequate implementation of Is_Acyclic, as it would loop on cyclic structures instead of returning False. It is, however, a good way to prove that a structure is acyclic, which is what we want to do here.
function Is_Acyclic (X : Extended_Index; M : Memory_Type) return Boolean
with
Pre => X in M'Range | No_Index and then Valid_Memory (M),
Post => (Static => (if X = No_Index then Is_Acyclic'Result));
procedure Lemma_Is_Acyclic_Def (X : Index_Type; M : Memory_Type)
with
Ghost => Static,
Pre => X in M'Range and then Valid_Memory (M),
Post => Is_Acyclic (X, M) = Is_Acyclic (Next (M (X)), M);We also want to express that an index is reachable from another index in the linked data structure using one or more calls to Next. To easily quantify over the indices reachable from a given index, we define a function that returns the set of all indices reachable from its parameter. Here again, we define it recursively: if X is No_Index and the set is empty, or the set contains at least X itself, plus all the indexes reachable from the index obtained by calling Next on the cell at index X in the memory. A function to decide if an index is reachable from another can be defined easily using this set:
function Reachable_Set
(X : Extended_Index; M : Memory_Type) return Memory_Index_Set
with
Pre => X in M'Range | No_Index and then Valid_Memory (M),
Post =>
(Static =>
(if X = No_Index
then
Is_Empty (Reachable_Set'Result)
and then Length (Reachable_Set'Result) = 0
else
Contains (Reachable_Set'Result, X)
and then
Length (Reachable_Set'Result)
<= To_Big (M'Last) - To_Big (M'First) + 1)
and then (for all I of Reachable_Set'Result => I in M'Range));
procedure Lemma_Reachable_Def (X : Index_Type; M : Memory_Type)
with
Ghost => Static,
Pre =>
X in M'Range and then Valid_Memory (M) and then Is_Acyclic (X, M),
Post =>
not Contains (Reachable_Set (Next (M (X)), M), X)
and then Reachable_Set (Next (M (X)), M) <= Reachable_Set (X, M)
and then Included_Except
(Reachable_Set (X, M), Reachable_Set (Next (M (X)), M), X)
and then
Length (Reachable_Set (X, M))
= 1 + Length (Reachable_Set (Next (M (X)), M));;
function Reachable
(X : Extended_Index; M : Memory_Type; Y : Index_Type) return Boolean
is (Contains (Reachable_Set (X, M), Y))
with
Pre =>
X in M'Range | No_Index and then Y in M'Range and then Valid_Memory (M);Finally, we provide a function that returns a model of the list of indexes starting at a given index in memory as a functional sequence. Theoretically, this sequence gives strictly more information than the set of reachable elements, as an element is in the set if and only if it is in the sequence. In practice, we found it is more efficient to reason about the set than to use the sequence when reasoning about reachability, so we have chosen to provide both:
function Model (X : Extended_Index; M : Memory_Type) return Sequence
with
Pre =>
X in M'Range | No_Index
and then Valid_Memory (M)
and then Is_Acyclic (X, M),
Post =>
(Static =>
Length (Model'Result) = Length (Reachable_Set (X, M))
and then
(for all I of Model'Result => Contains (Reachable_Set (X, M), I))
and then
(if X = No_Index
then Length (Model'Result) = 0
else
In_Range
(Length (Model'Result),
To_Big (1),
To_Big (M'Last) - To_Big (M'First) + 1)
and then Get (Model'Result, Last (Model'Result)) = X));
procedure Lemma_Model_Def (X : Index_Type; M : Memory_Type)
with
Ghost => Static,
Pre =>
X in M'Range and then Valid_Memory (M) and then Is_Acyclic (X, M),
Post =>
Length (Model (X, M)) - 1 = Length (Model (Next (M (X)), M))
and then Model (Next (M (X)), M) <= Model (X, M);You might have noticed that the indices in the sequence are in reverse order - i.e., the first index in the linked structure is the last element of the sequence. This is to make the recursive definition of the Model more efficient for provers, as it does not shift the existing indices in the sequence.
Usage for the Formal Hashed Sets
As stated at the beginning of this post, we introduced these concepts to verify the implementation of the formal hashed sets in the SPARKlib. To recall what we presented in a previous post, the hashed sets are implemented using an array of nodes, each containing a Next field that links the nodes into several linked lists: one for each bucket in the structure (and one for the free nodes).

The functions defined above are exactly what we need to express that this structure is well-formed. The first predicate below states that all buckets contain the heads of acyclic lists, while the second uses Reachable_Set to express that the buckets contain the correct elements - those whose hash values map them to this particular bucket. These two functions are used to encode (a part of) the invariant that the library maintains on the Set type.
function LL_Correct_Is_Acyclic_Buckets (S : Set) return Boolean
is (for all B1 in S.Buckets'Range =>
Is_Acyclic (S.Buckets (B1), S.Nodes));
-- All buckets are acyclic lists
function LL_Correct_Buckets_Element (S : Set) return Boolean
is ((for all B in S.Buckets'Range =>
(for all I of Reachable_Set (S.Buckets (B), S.Nodes) =>
S.Nodes (I).Element'Initialized))
-- Elements of reachable in buckets are initialized
and then
(for all B in S.Buckets'Range =>
(for all I of Reachable_Set (S.Buckets (B), S.Nodes) =>
Find_Bucket (S.Nodes (I).Element.V, S.Modulus) = B)));
-- And they are in the expected bucketThe Model function creates an intermediate model for the sets. Indeed, as the implementation is quite far from the library's specification, we have decided to verify it using successive refinements. For each refinement, we introduce a new model that abstracts away part of the complexity. If you want to look at the code, each intermediate model is stored in a separate (private) child package. The lowest-level model is in Data_Structure.Basic_Operations. It consists of two parts: a map from allocated indexes to initialized values, used to model the set's content, and several sequences of indexes to model the linked structure, one for each bucket and one for the free list. These sequences are constructed using the Model function presented previously:

Useful Lemmas
The three functions Is_Acyclic, Reachable_Set, and Model are defined recursively. As a result, most proofs concerning them are going to require reasoning by induction. Unfortunately, provers are bad with induction, and generally require guidance to find a proof - in the form of a ghost loop with an invariant or a ghost recursive lemma, as explained in the user guide. This is a bit cumbersome, and might not be so easy if you are not used to it. We have defined several useful lemmas for these functions that can be reused easily. We have lemmas of two kinds: preservation lemmas, which allow deducing how the functions are modified by an update to the structure, and classical lemmas, such as the reachability is asymmetric and transitive. Here is an example of what the preservation lemmas look like:
procedure Lemma_Reachable_Preserved
(X : Extended_Index; M1, M2 : Memory_Type)
with
Ghost => Static,
Pre =>
[...]
and then
(for all I of Reachable_Set (X, M1) => Next (M1 (I)) = Next (M2 (I))),
Post =>
Reachable_Set (X, M1) = Reachable_Set (X, M2)
and then
Length (Reachable_Set (X, M1)) = Length (Reachable_Set (X, M2));
-- Lemma for the preservation of the property if the Next elements of all
-- cells reachable from X are preserved.
procedure Lemma_Reachable_Set
(X, Y : Index_Type; Z : Extended_Index; M1, M2 : Memory_Type)
with
Ghost => Static,
Pre =>
[...]
and then Reachable (X, M1, Y)
and then Next (M2 (Y)) = Z
and then
(for all K in M1'Range =>
(if K /= Y then Next (M2 (K)) = Next (M1 (K)))),
Post =>
(for all I of Reachable_Set (X, M2) =>
Reachable (Z, M1, I)
or else
(Reachable (X, M1, I)
and then not Reachable (Next (M1 (Y)), M1, I)))
and then (for all I of Reachable_Set (Z, M1) => Reachable (X, M2, I))
and then
(for all I of Reachable_Set (X, M1) =>
Reachable (X, M2, I) or else Reachable (Next (M1 (Y)), M1, I));
-- Lemma for the preservation of the property if the Next element of a cell
-- Y reachable from X is set to a disjoint acyclic list Z.They take as input two memories, M1 and M2, and specify how the set of reachable indexes changes when moving from one to the other. Lemma_Reachable_Preserved is the simplest. It says that reachable indexes from an index X are preserved if the Next function returns the same thing on all indexes reachable from X in M1 and M2. The second is slightly more complex. It assumes that the next function has changed between M1 and M2 for a single index Y reachable from X. It then states that the indexes reachable from X in M2 are those reachable from X in M1 that are not reachable from the next index of Y, plus those reachable from Z (the value of the next index of Y in M2). There are similar lemmas for Is_Acyclic and Model. When proving the implementation of the hashed sets, we use these lemmas manually each time we update the node array:
-- The free list is preserved
if S.Free >= 0 then
Lemma_Is_Acyclic_Preserved (S.Free, Old_Nodes, S.Nodes);
Lemma_Reachable_Preserved (S.Free, Old_Nodes, S.Nodes);
Lemma_LL_Sequence_Preserved (S.Free, Old_Nodes, S.Nodes);
end if;Implementation and Proof
The implementation of Is_Acyclic, Reachable_Set, and Model uses a loop along with a set where we store all indexes that have been encountered already to avoid looping if the list has a cycle:
function Is_Acyclic (X : Extended_Index; M : Memory_Type) return Boolean
with
Refined_Post => (Static => Is_Acyclic'Result = Is_Acyclic_Internal (X, M))
is
Seen : Memory_Index_Set;
C : Extended_Index := X;
begin
while C /= No_Index loop
pragma Loop_Invariant (...);
if Contains (Seen, C) then
return False;
end if;
Seen := Add (Seen, C);
C := Next (M (C));
end loop;
return True;
end Is_Acyclic;To be able to prove its definition lemma, which is not implied by its postcondition, we have used a refined postcondition. It is an alternative contract that can be supplied on the body of a subprogram. It has to be more precise than the specification contract - namely, the refined postcondition should imply the normal postcondition. The refined postcondition is only available to prove the enclosing unit, and not other units that might use the library. So here it is available to prove the defining axiom, but not when verifying hashed sets, for example.
In the refined postconditions of Is_Acyclic, Reachable_Set, and Model, we use recursive versions of the functionalities so the prover can access their definitions. However, since we still need to terminate, we keep the set of seen indexes (or rather, here we keep unseen indexes for convenience) that is given to the functions as an additional parameter:
function Model_Internal
(X : Extended_Index; M : Memory_Type; S : Memory_Index_Set)
return Sequence
is (if X = No_Index or else not Contains (S, X)
then Empty_Sequence
else Add (Model_Internal (Next (M (X)), M, Remove (S, X)), X))
with
Ghost => Static,
Subprogram_Variant => (Decreases => Length (S));The correspondence between the recursive and imperative variants is maintained as a loop invariant in the implementation of the functionalities. I will not go into more details here, but don’t hesitate to check it out in the SPARKlib if you are interested!
Conclusion
In this post, I have presented a package in the SPARKlib that can be used to model and reason about linked data structures encoded inside an array. It introduces the notions of acyclicity and reachability in the underlying structure. It also provides a function that returns a sequence as a model of the structure, listing the indices reachable from a given index in the order in which they appear. If you want to implement your own linked structure in an array, it might come in handy. Please check it out.
Author
Claire Dross

Claire Dross has a PhD in deductive verification of programs using a satisfiability modulo theory solver with the Universite Paris-Sud. She also has an engineering degree from the Ecole Polytechnique and an engineering degree from the Ecole Nationale Superieure des Telecommunications. At AdaCore, she works full-time on the formal verification SPARK 2014 toolset.





