Gem #28: Changing Data Representation (Part 2)

by Robert Dewar —AdaCore

Last week, we discussed the use of derived types and representation clauses to achieve automatic change of representation. More accurately, this feature is not completely automatic, since it requires you to write an explicit conversion. In fact there is a principle behind the design here which says that a change of representation should never occur implicitly behind the back of the programmer without such an explicit request by means of a type conversion.

The reason for that is that the change of representation operation can be very expensive, since in general it can require component by component copying, changing the representation on each comoponent.

Let's have a look at the -gnatG expanded code to see what is hidden under thecovers here. For example, the conversion Arr (Input_Data) from last week's example generates the following expanded code:

   B26b : declare
      [subtype p__TarrD1 is integer range 1 .. 16]
      R25b : p__TarrD1 := 1;
   begin
      for L24b in 1 .. 16 loop
         [subtype p__arr___XP3 is
           system__unsigned_types__long_long_unsigned range 0 ..
           16#FFFF_FFFF_FFFF#]
         work_data := p__arr___XP3!((work_data and not shift_left!(
           16#7#, 3 * (integer(L24b - 1)))) or shift_left!(p__arr___XP3!
           (input_data (R25b)), 3 * (integer(L24b - 1))));
         R25b := p__TarrD1'succ(R25b);
      end loop;
   end B26b;

That's pretty horrible! In fact one of the Ada experts here thought that it was too gruesome and suggested simplifying it for this gem, but we have left it in its original form, so that you can see why it is nice to let the compiler generate all this stuff so you don't have to worry about it yourself.

Given that the conversion can be pretty inefficient, you don't want to convert backwards and forwards more than you have to, and the whole approach is only worth while if will be doing extensive computations involving the value.

The expense of the conversion explains two aspects of this feature that are not obvious. First, why do we require derived types instead of just allowing subtypes to have different representations, avoiding the need for an explicit conversion?

The answer is precisely that the conversions are expensive, and you don't want them happening behind your back. So if you write the explicit conversion, you get all the gobbledygook listed above, but you can be sure that this never happens unless you explicitly ask for it.

This also explains the restriction we mentioned in last week's gem from RM 13.1(10):

10 For an untagged derived type, no type-related representation items are allowed if the parent type is a by-reference type, or has any user-defined primitive subprograms.

It turns out this restriction is all about avoiding implicit changes of representation. Let's have a look at how type derivation works when there are primitive subprograms defined at the point of derivation. Connsider this example:

  type My_Int_1 is range 1 .. 10;

  function Odd (Arg : My_Int_1) return Boolean;

  type My_Int_2 is new My_Int_1;

Now when we do the type derivation, we inherit the function Odd for My_Int_2. But where does this function come from? We haven't written it explicitly, so the compiler somehow materializes this new implicit function. How does it do that?

We might think that a complete new function is created including a body in which My_Int_2 replaces My_Int_1, but that would be impractical and expensive. The actual mechanism avoids the need to do this by use of implicit type conversions. Suppose after the above declarations, we write:

  Var : My_Int_2;
  ...
  if Odd (Var) then
     ...

The compiler translates this as:

  Var : My_Int_2;
  ...
  if Odd (My_Int_1 (Var)) then
     ...

This implicit conversion is a nice trick, it means that we can get the effect of inheriting a new operation without actually having to create it. Furthermore, in a case like this, the type conversion generates no code, since My_Int_1 and My_Int_2 have the same representation.

But the whole point is that they might not have the same representation if one of them had a rep clause that made the representations different, and in this case the implicit conversion inserted by the compiler could be expensive, perhaps generating the junk we quoted above for the Arr case. Since we never want that to happen implicitly, there is a rule to prevent it.

The business of forbidding by-reference types (which includes all tagged types) is also driven by this consideration. If the representations are the same, it is fine to pass by reference, even in the presence of the conversion, but if there was a change of representation, it would force a copy, which would violate the by-reference requirement.

So to summarize these two gems, on the one hand Ada gives you a very convenient way to trigger these complex conversions between different representations. On the other hand, Ada guarantees that you never get these potentially expensive conversions happening unless you explicitly ask for them.

About the Author

Dr. Robert Dewar is co-founder, President and CEO of AdaCore and Emeritus Professor of Computer Science at New York University. With a focus on programming language design and implementation, Dr. Dewar has been a major contributor to Ada throughout its evolution and is a principal architect of AdaCore’s GNAT Ada technology. He has co-authored compilers for SPITBOL (SNOBOL), Realia COBOL for the PC (now marketed by Computer Associates), and Alsys Ada, and has also written several real-time operating systems, for Honeywell Inc. Dr. Dewar has delivered papers and presentations on programming language issues and safety certification and, as an expert on computers and the law, he is frequently invited to conferences to speak on Open Source software, licensing issues, and related topics.