Google

"http://www.w3.org/TR/REC-html40/loose.dtd"> Tagged Unions Previous Contents Next

4   Tagged Unions

In addition to struct, enum, and union, Cyclone has tunion (for ``tagged union'') and xtunion (for ``extensible tagged union'') as ways to construct new aggregate types. Like a union type, each tunion and xtunion has a number of variants (or members). Unlike with union, an object of a tunion or xtunion type is exactly one variant, we can detect (or discriminate) that variant at run-time, and the language prevents using an object as though it had a different variant.

The difference between tunion and xtunion is that tunion is closed---a definition lists all possible variants. It is like the algebraic datatypes in ML. With xtunion, separately compiled files can add variants, so no code can be sure that it knows all the variants. There is a rough analogy with not knowing all the subclasses of a class in an object-oriented language.

For sake of specificity, we first explain how to create and use tunion types. We then explain xtunion by way of contrast with tunion. Because the only way to read parts of tunion and xtunion types is pattern-matching, it is hard to understand tunion without pattern-matching, but for sake of motivation and completeness, some of the examples in the explanation of pattern-matching use tunion! To resolve this circular dependency, we will informally explain pattern-matching as we use it here and we stick to its simplest uses.

4.1   tunion



Basic Type Declarations and Subtyping
[Warning: For expository purposes, this section contains a white lie that is exposed in the later section called ``regions for tunion''.]

A tunion type declaration lists all of its variants. At its simplest, it looks just like an enum declaration. For example, we could say:


  tunion Color { Red, Green, Blue };

As with enum, the declaration creates a type (called tunion Color) and three constants Red, Green, and Blue. Unlike enum, these constants do not have type tunion Color. Instead, each variant has its own type, namely tunion Color.Red, tunion Color.Green, and tunion Color.Blue. Fortunately these are all subtypes of tunion Color and no explicit cast is necessary. So you can write, as expected:


  tunion Color c = Red;

In this simple example, we are splitting hairs, but we will soon find all these distinctions useful. Unlike enum, tunion variants may carry any fixed number of values, as in this example:


  tunion Shape {

    Point,

    Circle(float),

    Ellipse(float,float),

    Polygon(int,float),

  };

A Point has no accompanying information, a Circle has a radius, an Ellipse has two axis lengths, and a (regular) Polygon has a number of sides and a radius. (The value fields do not have names, so it is often better style to have a variant carry one value of a struct type, which of course has named members.) This example creates five types: tunion Shape, tunion Shape.Point, tunion Shape.Circle, tunion Shape.Ellipse, and tunion Shape.Polygon. Like in our previous example, tunion Shape.Point is a subtype of tunion Shape and Point is a constant of type tunion Shape.Point.

Variants that carry one or more values are treated differently. Circle becomes a constructor; given a float it produces an object of type tunion Shape.Circle, for example Circle(3.0). Similarly, Ellipse(0,0) has type tunion Shape.Ellipse (thanks to implicit casts from int to float for 0) and Polygon(7,4.0) has type tunion Shape.Polygon. The arguments to a constructor can be arbitrary expressions of the correct type, for example, Ellipse(rand(), sqrt(rand())).

The second difference is that value-carrying variant types (e.g., tunion Shape.Circle) are not subtypes of the tunion type (e.g., tunion Shape). Rather non-null pointers to the value-carrying variant types are (e.g., tunion Shape.Circle @`H is a subtype of tunion Shape). So the following are correct initializations that use implicit subtyping:


  tunion Shape s1 = Point;

  tunion Shape s2 = new Circle(3.0);

tunion types are particularly useful for building recursive structures. For example, a small language of arithmetic expressions might look like this:

  enum Unops { Negate, Invert};

  enum Binops { Add, Subtract, Multiply, Divide };

  tunion Exp {

    Int(int),

    Float(float),

    Unop(enum Unops, tunion Exp),

    Binop(enum Binops, tunion Exp, tunion Exp)

  };

A function returning an expression representing the multiplication of its parameter by two could like this:

  tunion Exp double_exp(tunion Exp e) {

    return new Binop(Multiply, e, new Int(2));

  }

Accessing tunion Variants
Given a value of a tunion type, such as tunion Shape, we do not know which variant it is.

For non-value variants, we can use a standard comparison. Continuing the example from above, ``s1 == Point'' would be true whereas ``s2 == Point'' would be false.

Analogous comparisons would not work for value-carrying variants because these variants are pointers. Rather than provide predicates (perhaps of the form isCircle(s1)), Cyclone requires pattern-matching. For example, here is how you could define isCircle:

  bool isCircle(tunion Shape s) {

    switch(s) {

    case &Circle(r): return true;

    default: return false;

    }

  }

When a switch statement's argument has a tunion type, the cases describe variants. One variant of tunion Shape is a pointer to a Circle, which carries one value. The corresponding pattern has & for the pointer, Circle for the constructor name, and one identifier for each value carried by Circle. The identifiers are binding occurrences (declarations, if you will), and the initial values are the values of the fields of the Circle at which s points. The scope is the extent of the case clause. Pattern-matching works for non-value variants too, but there is no & because they are not pointers.

Here is another example:

[The reader is asked to indulge compiler-writers who have forgotten basic geometry.]

  extern area_of_ellipse(float,float);

  extern area_of_poly(int,float);

  float area(tunion Shape s) {

    float ans;

    switch(s) {

    case Point:

      ans = 0;

      break;

    case &Circle(r):

      ans = 3.14*r*r;

      break;

    case &Ellipse(r1,r2):

      ans = area_of_ellipse(r1,r2);

      break;

    case &Polygon(sides,r):

      ans = area_of_poly(sides,r);

      break;

    }

    return ans;

  }

The cases are compared in order against s. The following are compile-time errors:
  • It is possible that a member of the tunion type matches none of the cases. Note that default matches everything.
  • A case is useless because it could only match if one of the earlier cases match. For example, a default case at the end of the switch in area would be an error.
We emphasize that Cyclone has much richer pattern-matching support than we have used here.

Implementation
Non-value variants are translated to distinct small integers. Because they are small, they cannot be confused with pointers to value-carrying variants. Value-carrying variants have a distinct integer tag field followed by fields for the values carried. Hence all values of a tunion type occupy one word, either with a small number or with a pointer.

Regions for tunion
We have seen that non-null pointers to value-carrying variants are subtypes of the tunion type. For example, tunion Shape.Circle @`H is a subtype of tunion Shape. Because tunion Shape.Circle @`H is a pointer into the heap, it would seem that all values of type tunion Shape are either non-value variants or pointers into the heap. In fact, this is true, but only because tunion Shape is itself shorthand for tunion `H Shape.

In other words, tunion types are region-polymorphic over the region into which the value-carrying variants point. An explicit region annotation goes after tunion, just like an explicit region annotation goes after * or @. Here is an example using a stack region:

  tunion Shape.Circle c = Circle(3.0);

  tunion _ Shape s = &c;

The _ is necessary because we did not give an explicit name to the stack region.

We can now correct the white lie from the ``basic type declarations and subtyping'' section. A declaration tunion Foo {...} creates a type constructor which given a region creates a type. For any region `r, tunion `r Foo is a subtype of tunion Foo.Bar @`r if tunion Foo.Bar carries values. If tunion Foo.Bar does not carry values, then it is a subtype of tunion `r Foo for all `r.

In the future, we may make the implied region for tunion Foo depend on context, as we do with pointer types. For now, tunion Foo is always shorthand tunion `H Foo.

Polymorphism and tunion
A tunion declaration may be polymorphic over types and regions just like a struct definition (see the section on polymorphism). For example, here is a declaration for binary trees where the leaves can hold some BoxKind `a:

  tunion <`a> Tree {

    Leaf(`a);

    Node(tunion Tree<`a>, tunion Tree<`a>);

  };

In the above example, the root may be in any region, but all children will be in the heap. This version allows the children to be in any region, but they must all be in the same region. (The root can still be in a different region.)


  tunion <`a,`r::R> Tree {

    Leaf(`a);

    Node(tunion `r Tree<`a,`r>, tunion `r Tree<`a,`r>);

  };

Existential Types
[This feature is independent of the rest of tunion's features and can be safely ignored when first learning Cyclone.]

In addition to polymorphic tunion types, it is also possible to parameterize individual variants by additional type variables. (From a type-theoretic point of view, these are existentially-quantified variables.) Here is a useless example:


  tunion T { Foo<`a>(`a, `a, int), Bar<`a,`b>(`a, `b), Baz(int) };

The constructors for variants with existential types are used the same way, for example Foo("hi","mom",3), Foo(8,9,3), and Bar("hello",17) are all well-typed. The compiler checks that the type variables are used consistently---in our example, the first two arguments to Foo must have the same type. There is no need (and currently no way) to explicitly specify the types being used.

Once a value of an existential variant is created, there is no way to determine the types at which it was used. For example, Foo("hi","mom",3) and Foo(8,9,3) both have type, ``there exists some `a such that the type is Foo<`a>''. When pattern-matching an existential variant, you must give an explicit name to the type variables; the name can be different from the name in the type definition. Continuing our useless example, we can write:

  void f(tunion T t) {

    switch(t) {

    case Foo<`a>(x,y,z): return;

    case Bar<`b,`c>(x,y): return;

    case Baz(x): return;

    }

  }

The scope of the type variables is the body of the case clause. So in the first clause we could create a local variable of type `a and assign x or y to it. Our example is fairly ``useless'' because there is no way for code to use the values of existentially quantified types. In other words, given Foo("hi","mom",3), no code will ever be able to use the strings "hi" or "mom". Useful examples invariably use function pointers. For a realistic library, see fn.cyc in the distribution. Here is a smaller (and sillier) example; see the section on region and effects for an explanation of why the `e stuff is necessary.

  int f1(int x, int y) { return x+y; }

  int f2(string x, int y) {printf("%s",x); return y; }

  tunion T<`e::E> { Foo<`a>(`a, int f(`a, int; `e)); };

  void g(bool b) {

    tunion T<{}> t;

    if(b)

      t = Foo(37,f1);

    else

      t = Foo("hi",f2);

    switch(t) {

    case Foo<`a>(arg,fun):

      `a x = arg;

      int (*f)(`a,int;{}) = fun;

      f(arg,19);

      break;

    }

  }

The case clause could have just been fun(arg)---the compiler would figure out all the types for us. Similarly, all of the explicit types above are for sake of explanation; in practice, we tend to rely heavily on type inference when using these advanced typing constructs.

Future
  • Currently, given a value of a variant type (e.g., tunion Shape.Circle), the only way to access the fields is with pattern-matching even though the variant is known. We may provide a tuple-like syntax in the future.
  • If a tunion has only one value-carrying variant, it does not need a tag field in its implementation. We have not yet implemented this straightforward optimization.

4.2   xtunion



We now explain how an xtunion type differs from a tunion type. The main difference is that later declarations may continue to add variants. Extensible datatypes are useful for allowing clients to extend data structures in unforeseen ways. For example:

  xtunion Food;

  xtunion Food { Banana; Grape; Pizza(list_t<xtunion Food>) };

  xtunion Food { Candy; Broccoli };

After these declarations, Pizza(new List(Broccoli, null)) is a well-typed expression.

If multiple declarations include the same variants, the variants must have the same declaration (the number of values, types for the values, and the same existential type variables).

Because different files may add different variants and Cyclone compiles files separately, no code can know (for sure) all the variants of an xtunion. Hence all pattern-matches against a value of an xtunion type must end with a case that matches everything, typically default.

There is one built-in xtunion type: xtunion exn is the type of exceptions. Therefore, you declare new xtunion exn types like this:


  xtunion exn {BadFilename(string)};

The implementation of xtunion types is very similar to that of tunion types, but non-value variants cannot be represented as small integers because of separate compilation. Instead, these variants are represented as pointers to unique locations in static data. Creating a non-value variant still does not cause allocation.


Previous Contents Next