Information Hiding and Context Management in SPARK

By default, SPARK uses automated solvers as a backend. It means that, in general, verification is fully automated once the user has provided contracts on all their subprograms and possibly invariants on their loops. Unfortunately, when the proof becomes too complex, the tool may be unable to verify a correct, fully annotated program, or at least not in a reasonable amount of time. This is why it is generally a good idea to split complex subprograms into smaller parts, making them easier to verify. But sometimes, it is not enough. In this case, it might be worth trying other techniques to reduce the complexity of the proofs.

As explained in a previous blog post, I recently worked on verifying the formal hashed sets package in SPARKlib. Even if the package is not overly big, the encoded data structure is quite complex, and its verification was not straightforward. In this post, I will explain the techniques I have used to simplify the proof by hiding unnecessary information from the solvers. It is split into three parts, ranging from the tricks that are easier to understand and use, to more complex tricks. Don’t hesitate to skip ahead whenever you want and jump to the conclusion directly.

Abstraction at the Level of the Package

In Ada, abstraction is generally enforced at the package level: entities declared in the public part of the specification are visible from the rest of the program, whereas those declared in the private part or the body can only be used from the package itself. Proof works in a similar way, except that the boundary is not set between the public and the private part of the specification, but between the specification and the body. More precisely, by default, everything that is defined in the public and private parts of the specification is visible for proof from outside of the package, whereas the body is hidden. For example, in the following package, the definitions of F1 and F2 are visible when verifying users of P, but not the definition of F3:

package P is
   function F1 return Integer is (1);
   function F2 return Integer;
   function F3 return Integer;
private
   function F2 return Integer is (2);
end P;

package body P is
   function F3 return Integer is (3);
end P;

It might be considered a strange choice to make the private part of a package visible by default, but we have found that it is in general the best tradeoff. Indeed, it makes it possible for the proof tool to peek into the definition of private types. If the private part were hidden, then users would have to introduce visible models to describe their private types. Here, the full view of the type acts as a default model. As an example, the fact that the full view of Stack and the definitions of Is_Empty and Is_Full are visible in the following package makes it possible to verify that a stack cannot be at once full and empty without having to add any contract. Similarly, we don’t need any annotation to know that the stack is empty when initialized by default.

package P is
   type Stack is private;
   function Is_Empty (X : Stack) return Boolean;
   function Is_Full (X : Stack) return Boolean;
private
   type Stack is record
     Top : Natural := 0;
     …
   end record;
   function Is_Empty (X : Stack) return Boolean is (X.Top = 0);
   function Is_Full (X : Stack) return Boolean is (X.Top = Max);
end P;

Abstraction is key when verifying a complex program, and I have used it several times when working on the formal hashed sets. For example, consider the - rather complex - recursive definitions of reachability properties used to implement and verify the SPARK.Higher_Order.Reachability package of the SPARKlib presented in a previous blog post (ref tab2). Thanks to abstraction at the package boundary, these definitions are not visible when verifying formal hashed sets, even though the generic instance makes it so that it is a nested package. It is especially interesting for recursive definitions, or those involving universally quantified formulas, as they can lead to numerous instances being added in the context.

While in general the default behavior of SPARK with respect to abstraction at the package boundary is a good tradeoff, it is possible for the user to override it manually. It can be useful in particular to make the body of a nested package visible when its purpose is not abstraction. For example, in the container library of SPARK, we have chosen to use a (ghost) nested package called Formal_Model to group together the definitions of model functions (used in contracts) and lemmas to improve readability. The definitions of these functions cannot be given directly in the private part of Formal_Model as they depend on the full view of the container type, defined in the enclosing package. Therefore, they have to be deferred to the package body. Since these definitions are necessary for the proof, I had to make the body of Formal_Model visible from its enclosing package. This can be done using a pragma Annotate for GNATprove on the body of Formal_Model as follows:

   package body Formal_Model is
      pragma Annotate (GNATprove, Unhide_Info, "Package_Body");

A similar annotation can be used to hide the private part of a package. It is not often useful in my experience. In general, it is easy to defer the definition of functions that we do not want to expose to the body of the package, and the definition of the type itself is rarely big enough to really impact the proof. As an example, we could hide the definition of Set from user code, as it is too low-level and not necessary for proof, since we provide models for sets:

private
   pragma Annotate (GNATprove, Hide_Info, “Private_Part”);

Generally, introducing abstraction barriers is very useful in SPARK. It also makes readability, code, and proof maintenance easier as it enforces separation of concerns. It requires some effort, as, as a user, you need to design the abstraction, but it is very efficient when done correctly, as it reduces the complexity for the whole program. The mechanism can even be used to allow proof by refinement. This is interesting to prove high-level properties on data structures for example. I have used it years ago to formally verify insertion inside a red-black tree, as described in this article (ref Auto-Active Proof of Red-Black Trees in SPARK). I have used the same mechanism when verifying hashed sets, but I will leave the details for a future post.

Manual information hiding and context management

There are features in SPARK that allow users to manually control the proof context. They are advanced proof features and are part of the *auto-active* style that is proposed by the tool to guide the provers. You can look at this CACM article if you are interested in auto-active verification in general, as I will only focus on information hiding for context management here. We have also given a talk more globally on the subject of (manual and automated) context management at the Dafny26 workshop. You can read the paper here.

Information that we might want to hide or disclose can be stored inside a lemma. A lemma in mathematics is a piece of reasoning that allows deducing a conclusion from a set of premises in a reusable way. The proof is done once and for all, and the information can be reused as needed without re-verifying the lemma. In SPARK, a lemma is a ghost subprogram, generally a procedure, but we can use functions too, that has no effect but has a contract. The premises of the lemma are stated as preconditions whereas its conclusion is given as a postcondition. As an example, here is a lemma introduced in the SPARK.Higher_Order.Reachability package of the SPARKlib to give the recursive definition of the Is_Acyclic function:

  function Is_Acyclic (X : Extended_Index; M : Memory_Type) return Boolean
   with
     Pre  => X in M'Range | No_Index and then Valid_Memory (M),
     Post => (Static => (if X = No_Index then Is_Acyclic'Result));

   procedure Lemma_Is_Acyclic_Def (X : Index_Type; M : Memory_Type)
   with
     Ghost    => Static,
     Pre      => X in M'Range and then Valid_Memory (M),
     Post     => Is_Acyclic (X, M) = Is_Acyclic (Next (M (X)), M);

Lemmas in SPARK work because the verification performed by GNATprove is modular on a per-subprogram basis. The contract of the procedure (the lemma) is verified once and for all when its body is verified. When the procedure is called, the lemma is applied: the precondition (the premises) is checked, and the postcondition (the conclusion) is assumed.

In general, it is necessary to manually call the ghost subprogram to use a lemma in SPARK. Therefore, using a lemma for (a part of) the postcondition/definition of a subprogram can be a good way to allow a fine-grained manual handling of the context. If we need the information, we can call the lemma, and if we don’t, the information is not going to cause the context to grow unnecessarily, with a potentially heavy recursive definition in the case of Is_Acyclic.
Note that, to prove a lemma that contains part of the definition of a subprogram, the regular contract of the subprogram is generally not enough. This is by design, as we are on purpose removing a part of the necessary information (the recursive definition of Is_Acyclic here). If the subprogram has a regular body, then the information coming from the body won’t be available to prove the lemma either - as GNATprove is modular on a per subprogram basis. So that this information is visible for the verification of the lemma but not for the verification of user code, we rely on abstraction: a more precise contract is given in the body of the enclosing package, either as an expression function, or using a refined postcondition. It will only be available for the verification of the enclosing package itself, and not when proving user code. As an example, here is what we could have used for the body of Is_Acyclic so my lemma is provable:

  function Is_Acyclic (X : Extended_Index; M : Memory_Type) return Boolean is
     (X = No_Index or else Is_Acyclic (Next (M (X)), M));
   --  Variant using an expression function, whose body works as a contract

   function Is_Acyclic (X : Extended_Index; M : Memory_Type) return Boolean with
      Refined_Post =>
         (Static => (if X = No_Index then Is_Acyclic'Result
                           else Is_Acyclic (Next (M (X)), M)))
   is [...]
   --  Variant using a refined postcondition and a regular body

If we want the information in a lemma to be available without having to call it explicitly, it is possible to use a specific annotation called Automatic_Instantiation. If this annotation is supplied on a lemma, then it should be declared directly after a function. The information contained in the lemma will be available for proof as soon as the associated function is used. For example, I could have used this mechanism to make the definition axiom above available as soon as Is_Acyclic is called:

   function Is_Acyclic (X : Extended_Index; M : Memory_Type) return Boolean
   with
     Pre  => X in M'Range | No_Index and then Valid_Memory (M),
     Post => (Static => (if X = No_Index then Is_Acyclic'Result));

   procedure Lemma_Is_Acyclic_Def (X : Index_Type; M : Memory_Type)
   with
     Ghost    => Static,
     Pre      => X in M'Range and then Valid_Memory (M),
     Post     => Is_Acyclic (X, M) = Is_Acyclic (Next (M (X)), M),
     Annotate => (GNATprove, Automatic_Instantiation);
   --  The recursive definition of Is_Acyclic is always available when Is_Acyclic is called

The requirement that automatically instantiated lemmas should always be associated with a function is here so this axiom is not present in the proof context of unrelated subprograms. In this example, the automated instantiated lemma is effectively equivalent to grouping everything together in the postcondition of Is_Acyclic. However there are cases where lemmas give more flexibility than simple postconditions. For example, my hashed sets are parametrized by an equivalence relation on elements: the set can contain at most one element per equivalence class. I have decided to leave this function uninterpreted for my verification, and only assume that it is a valid equivalence relation: it is reflexive, symmetric, and transitive. These properties are easy to express as lemmas, that can have more - or less - parameters than the function itself:

   function Equivalent_Elements (X, Y : Element_Type) return Boolean;
   --  Whether X and Y are considered equal for set membership purposes

   --  Lemmas stating the Equivalent_Elements is an equivalent relation

   procedure Lemma_Equivalent_Elements_Refexive (X : Element_Type)
   with Ghost,
     Post     => Equivalent_Elements (X, X),
     Annotate => (GNATprove, Automatic_Instantiation);

   procedure Lemma_Equivalent_Elements_Symmetric (X, Y : Element_Type)
   with Ghost,
     Post     => Equivalent_Elements (X, Y) = Equivalent_Elements (Y, X),
     Annotate => (GNATprove, Automatic_Instantiation);

   procedure Lemma_Equivalent_Elements_Transitive (X, Y, Z : Element_Type)
   with Ghost,
     Post   =>
       (if Equivalent_Elements (X, Y) and Equivalent_Elements (Y, Z)
        then Equivalent_Elements (X, Z));

The choice to make information available by default - using the visible body of an expression function, a contract, or an automatically instantiated lemma - or only through manual application of lemmas is a trade-off. The first requires less manual effort in general, but might result in the context getting out-of-hand and making everything harder to prove. In general, while introducing abstraction is generally a good idea from the start, I generally only resort to manual context management if I encounter scalability issues. To know whether information is likely to cause a big growth in the proof context, there are a few rules of thumb: if it contains a universally quantified formula, or a recursive definition, then it is susceptible to produce a lot of instances - if the quantified formula is used for a lot of new values, or the recursive definition unfolded a lot of time - or even a *matching loop* where the provers cannot stop producing new instances. Note that there can be other reasons to avoid automatically instantiating an axiom - aside from the growth of the context. In particular, provers are bad at “guessing” a value that does not already exist in the context to instantiate a lemma. It is the reason why I have decided against automatically instantiating the transitivity lemma for my equivalence relation. Provers were bad at guessing the right intermediate value Y to prove Equivalent_Elements (X, Z).

Tricks and Tips for Information Hiding

In this last section of this post, I am going to explain some neat tricks that can be used for hiding and disclosing information by using (and sometimes abusing) several features of GNATprove. It is a bit of a grab bag, with small parts that can be read independently.

Disclosing bodies of expression functions on a case-by-case basis

The first tip I am going to present is really a – kind of experimental – feature of the tool. It is possible, using variants of the Hide_Info and Unhide_Info annotations presented at the beginning of this article, to hide the body of an expression function when verifying the body of a specific subprogram, or conversely, to hide it by default, even though it should be visible, and only disclose it when verifying the body of a subprogram. As an example, consider the following package:

package P is
   function F1 return Integer is (1);
   function F2 return Integer is (2) with
     Annotate => (GNATprove, Hide_Info, “Expression_Function_Body”);
end P;

As per the default visibility rules of GNATprove, the bodies of F1 and F2 should both be visible when verifying Main:

with P; use P;
procedure Main is
begin
   pragma Assert (F1 = 1);
   pragma Assert (F2 = 2);
end Main;

However, due to the annotation on F2, its body is not visible by default and the second assertion cannot be proved. This is similar to what can be done to the private part or body of packages. But we can go further and use an annotation to change the visibility specifically in the body of Main_2:

with P; use P;
procedure Main_2 is
   pragma Annotate (GNATprove, Hide_Info, “Expression_Function_Body”, F1);
   pragma Annotate (GNATprove, Unhide_Info, “Expression_Function_Body”, F2);
begin
   pragma Assert (F1 = 1);
   pragma Assert (F2 = 2);
end Main_2;

The first annotation hides the body of an expression function that is visible by default, like F1, while the second discloses the body of an expression that has been hidden by default using an annotation on its definition, like F2. It cannot be used however to disclose information that is hidden by abstraction. So here, in the body of Main_2, the first assertion becomes unprovable while the second is proved.

This feature provides a mechanism similar to abstraction at the package boundary, but is comparatively lighter, as it does not require introducing a package. In particular, it is possible to use it to hide the recursive definition of Is_Acyclic in its postcondition and only disclose it through a lemma as we have done earlier:

function Is_Acyclic_Def (X : Extended_Index; M : Memory_Type) return Boolean is
  (if X in M'Range and then Valid_Memory (M)
   then Is_Acyclic (X, M) = Is_Acyclic (Next (M (X)), M))
 with
   Ghost    => Static,
   Annotate => (GNATprove, Hide_Info, “Expression_Function_Body”);

   function Is_Acyclic (X : Extended_Index; M : Memory_Type) return Boolean
   with
     Pre  => X in M'Range | No_Index and then Valid_Memory (M),
     Post => (Static => (if X = No_Index then Is_Acyclic'Result));

   procedure Lemma_Is_Acyclic_Def (X : Index_Type; M : Memory_Type)
   with
     Ghost    => Static,
     Pre      => X in M'Range and then Valid_Memory (M),
     Post     => Is_Acyclic (X, M) = Is_Acyclic (Next (M (X)), M)

   procedure Lemma_Is_Acyclic_Def (X : Index_Type; M : Memory_Type) is
       pragma Annotate (GNATprove, Unhide_Info, “Expression_Function_Body”, Is_Acyclic_Def);
   [...]

Personally, I tend to prefer proper abstraction, as I find it cleaner, but this feature can come in handy, in particular to assess the effect of information hiding prior to introducing an abstraction layer. Note that there is a risk of losing a considerable amount of time trying to remove information with no visible effect on provability.

Automatically instantiating lemmas on a case-by-case basis

As explained in the previous section of the article, there is a trade-off between making information available automatically and restricting it to lemmas that need to be instantiated manually. The second is more labor intensive, because of the manual instances, while the first might cause the context to explode. As a middle ground, it is possible to get a way to automatically instantiate a lemma, but only in a particular context. This can be done by combining automated instantiation with the default (automated) context pruning of GNATprove. Consider the following pattern:

   function Disclose_Is_Acyclic return Boolean
   is (True)
   with Ghost => Static, Post => True;

   procedure Lemma_Is_Acyclic_Def (X : Index_Type; M : Memory_Type)
   with
     Ghost    => Static,
     Pre      => X in M'Range and then Valid_Memory (M),
     Post     => Is_Acyclic (X, M) = Is_Acyclic (Next (M (X)), M),
     Annotate => (GNATprove, Automatic_Instantiation);

   procedure Disclose_Is_Acyclic
   with Ghost => Static, Post => Disclose_Is_Acyclic;
   --  Disclose the recursive definitions of Is_Acyclic for the verification of
   --  the enclosing entity.

The first function is used as a placeholder to attach the automatically instantiated lemma Lemma_Is_Acyclic_Def that we have seen before. The context pruning of GNATprove will make it so that the definition of the lemma is available automatically, but only if the associated function is called in the context. Then, I can add a procedure that does nothing but add the function in the context of all the following checks, like Disclose_Is_Acyclic. The result is similar to the case-by-case hiding of expression function bodies, except that it is more fine-grained, as the axiom is not available in the whole body of the subprogram, but only in contexts in which the placeholder function is visible. For example, if I put the call to Disclose_Is_Acyclic in a block with a pragma Assert_And_Cut afterward, then the lemma won’t be available anymore after the pragma.

This pattern is actually what I have chosen to use in the SPARK.Higher_Order.Reachability package of the SPARKlib, but with a tweak. I wanted to allow the user to choose whether they want the recursive definitions of the reachability functions to be instantiated automatically by default using a generic parameter. To achieve this, I have added an additional automatically instantiated lemma that calls Disclose_Is_Acyclic, but with a guard that references the generic formal parameter:

   Automatically_Instantiate_Definitions : Boolean := True;
   --  Generic formal parameter.
   --  Set to True to instantiate lemmas giving the recursive definitions of
   --  Is_Acyclic, Reachable_Set, and Model automatically.

   function Is_Acyclic (X : Extended_Index; M : Memory_Type) return Boolean
   with
     Pre  => X in M'Range | No_Index and then Valid_Memory (M),
     Post => (Static => (if X = No_Index then Is_Acyclic'Result));

   procedure Lemma_Automatically_Instantiate_Is_Acyclic_Def
   with
     Ghost    => Static,
     Pre      => Automatically_Instantiate_Definitions,
     Post     => Disclose_Is_Acyclic,
     Annotate => (GNATprove, Automatic_Instantiation);

The procedure Lemma_Automatically_Instantiate_Is_Acyclic_Def is instantiated automatically whenever Is_Acyclic is in the context, pulling the placeholder function Disclose_Is_Acyclic and therefore the recursive definition. However, this will not happen if Automatically_Instantiate_Definitions is statically false, as GNATprove recognizes that the lemma is statically disabled and does not generate an axiom for it.

Instantiating lemmas in expressions using cut operations

Finally, the last tip I want to explain is related to the use of lemmas inside cut operations. Cut operations are functions that are defined in the SPARK.Cut_Operations package of the SPARKlib and recognized specifically by GNATprove. In particular, the one I find the most useful is By. It is defined as follows:

   function By (Consequence, Premise : Boolean) return Boolean
   with Ghost, Global => null;

It can be used in an assertion to introduce additional information necessary to prove the assertion without polluting the context. More precisely, if I write:

pragma Assert (By (P1, P2));

Then GNATprove will use P2 as an intermediate step in the proof of P1: it will first try to prove P2, then prove P1 assuming P2. After the assertion, the intermediate step will be invisible, it will be as if P1 was proved directly. This is really interesting in particular when combined with quantified expressions. Indeed, it is possible to nest calls to By inside universally - or existentially, but I won’t speak about that here - quantified formulas! For example, if I write:

pragma Assert (for all X in … => By (P1 (X), P2 (X)));

Then the tool will first prove (for all X in … => P2 (X)) and then (for all X in … => (if P2 (X) then P1 (X))). This might not seem like a huge gain, but now assume that P2 could be a lemma that I need to prove P1, then presumably (for all X in … => P2 (X)) would be trivial - the lemma is already proved - and I would only get the second quantified formula, in which I have the instantiated lemma available. It would save me from having to introduce a loop to call the lemma function once for each quantified element.

This is a pattern that I have used to instantiate the transitivity lemma of my equivalence relation when verifying the hashed sets library. To do this, I have first turned my lemma into a function (I cannot call a procedure inside a cut operation). As it does not need a result, I have used a function that always returns True:

 function Prove_Equivalent_Elements
        (E1, E2, E : Element_Type) return Boolean
      with
        Pre  => Equivalent_Elements (E1, E) and Equivalent_Elements (E2, E),
        Post =>
          Prove_Equivalent_Elements'Result and Equivalent_Elements (E1, E2);

It is easily proved using a manual call to my transitivity lemma procedure. Then, I can easily use it to prove that replacing the element at position P in my set by an equivalent value cannot create a duplication:

         pragma
           Assert
             (for all Q in Interval'(1, Last (Model)) =>
                (if Q /= P
                 then
                   By
                     (not Equivalent_Elements  (Get (Model, Q), E),
                      (if Equivalent_Elements (Get (Model, Q), E)
                       then
                         Prove_Equivalent_Elements (Get (Old_Model, P), Get (Model, Q), E)))));

Note that this pattern works well with the placeholder function that makes a lemma automatically available in a particular context. Calling Disclose_Is_Acyclic in the premise (second parameter) of the By operation makes the definition available to prove the assertion without pulling it into the context.

Conclusion

In this post, I have explained several techniques for simplifying proofs by hiding unnecessary information from solvers. Abstraction at the package boundary is a powerful tool that notably allows simplifying complex proofs by ensuring separation of concerns. More fine-grained handling of context can be achieved by making some information available only on demand. This is usually done by introducing lemma subprograms that can be instantiated manually. Going further, it is possible to make information available only in a specific scope, which can be a middle ground between making it available by default and requiring manual instances of lemmas. I have presented all these techniques in this post and explained how I have used them to verify my implementation of the formal bounded hashed sets. In my next post, I will go into more detail about how abstraction at the package boundary can be used to prove by successive refinements.

Tools

Services

Get an Overview

Ready To Go?

Languages

Ready To Go?

Industries

Learn how NVIDIA Adopted SPARK for Security-Critical Software Development

Company

Careers

Explore Resources

The AdaCore Blog

Learn Ada & SPARK

Community

Get Started with Ada

Support

Product Roadmap

Information Hiding and Context Management in SPARK

Abstraction at the Level of the Package

Manual information hiding and context management

Tricks and Tips for Information Hiding

Disclosing bodies of expression functions on a case-by-case basis

Automatically instantiating lemmas on a case-by-case basis

Instantiating lemmas in expressions using cut operations

Conclusion

Author

Claire Dross

Latest Blog Posts

Three Ways SPARK Catches Bugs Before Your Hardware Does

MARS Rover GNC

How Language Design Impacts Security