Chapter 10. Datatypes

Datatypes are a form of user-defined types for classifying data (or values). The support for datatypes and pattern matching in ATS is primarily adopted from ML.

The following code declares a datatype of the name weekday for values representing weekdays:

datatype weekday = | Monday | Tuesday | Wednesday | Thursday | Friday

There are five data constructors associated with weekday, which are Monday, Tuesday, Wednesday, Thursday, and Friday. All of these data constructors are nullary, that is, they take no arguments to form values (of the type weekday).

Each nullary constructor is represented as a small integer (e.g., one that is less than 1024). One can use the following function weekday2int to find out the integers reprsenting the constructors associated with weekday:

// staload UN = "prelude/SATS/unsafe.sats" // fun weekday2int (wd: weekday): int = $UN.cast{int}($UN.cast{intptr}(wd)) //

The small integer representing a nullary constructor is often referred to as the tag of the constructor. In this case, the tags for Monday, Tuesday, Wednesday, Thursday, and Friday are 0, 1, 2, 3, and 4, respectively.

Given a nullary constructor foo, both foo and foo() can be used to refer the value constructed by foo. However, only foo() can be used as the pattern that matches this value. For instance, the following function tests whether a given value of the type weekday is formed with the constructor Friday:

fun isFriday(x: weekday): bool = case+ x of Friday() => true | _ => false

Note that the pattern Friday() cannot be replaced with Friday as the latter is treated as a variable when used as a pattern. On the other hand, both of the following assertions hold at run-time as Friday and Friday() refer to the same value:

val () = assertloc (isFriday(Friday)) val () = assertloc (isFriday(Friday()))

If there is only one constructor associated with a datatype, then there is no tag in the representation for values of this datatype.

A datatype is list-like if there are two data constructors associated with it such that one is nullary (nil-like) and the other is non-nullary (cons-like). For instance, the datatype ab declared as follows is list-like:

datatype ab = A of () | B of (bool)

The values of a list-like datatype are represented in a special way. Given a value of the datatype; if it is constructed with the nil-like constructor, then it is represented as the null-pointer; if it is constructed with the cons-like constructor, then it is reprenstend as a heap-allocated tuple containing all of the arguments passed to the constructor. In the above case, the value A() is represented as the null pointer, and the value B(b) for each boolean b is represented as a heap-allocated singleton tuple containing the only component b. This description can be readily verified with the following code:

val () = assertloc (iseqz($UN.cast{ptr}(A()))) val () = assertloc (true = $UN.ptr0_get<bool>($UN.cast{ptr}(B(true)))) val () = assertloc (false = $UN.ptr0_get<bool>($UN.cast{ptr}(B(false))))

The following declaration introduces a datatype of the name abc:

datatype abc = | A of () | B of (bool) | C of (int, double)

The three constructors associated with abc are A, B, and C; A is nullary; B is unary, taking a boolean to form a value (of the type abc); C is binary, taking an integer and a float (of double precision) to form a value (of the type abc).

In a general case, if a data constructor is n-ary for some positive n, then each value it constructs is a heap-allocated tuple of n+1 components, where the first one is the tag assigned to the constructor and the rest are the arguments passed to the constructor. For instance, the following function can be called to find out the tags assigned to the constructors associated with abc:

fun abc2tag (x: abc): int = let val p = $UN.cast{intptr}(x) in // case+ 0 of | _ when p < 1024 => $UN.cast{int}(p) | _ (*heap-allocated*) => $UN.ptr0_get<int>($UN.cast{ptr}(p)) // end // end of [abc2tag]

In this case, the tags assigned to A, B, and C are 0, 1, and 2, respectively.

Datatypes can be defined recursively. As an example, the following declaration introduces a recursively defined datatype intexp (for representing arithemetic integer expressions):

datatype intexp = | Int of int | Neg of (intexp) | Add of (intexp, intexp) | Sub of (intexp, intexp)

For instance, (1+2)-3 can be represented as Sub(Add(Int(1), Int(2)), Int(3)). As another example, the following code introduces two mutually recursively defined datatypes intexp and boolexp (for integer expressions and boolean expressions, respectively):

datatype intexp = | Int of int | Neg of (intexp) | Add of (intexp, intexp) | Sub of (intexp, intexp) | IfThenElse of (boolexp, intexp, intexp) and boolexp = | Bool of bool // constant | Not of (boolexp) // negation | Less of (intexp, intexp) // Less(x, y): x < y | LessEq of (intexp, intexp) // LessEq(x, y): x <= y | Conj of (boolexp, boolexp) // Conj(x, y): x / y | Disj of (boolexp, boolexp) // Disj(x, y): x / y

The code below implements two mutually recursive functions eval_intexp and eval_boolexp for evaluating integer expressions and boolean expressions, respectively:

// symintr eval // extern fun eval_intexp : intexp -> int extern fun eval_boolexp : boolexp -> bool // overload eval with eval_intexp overload eval with eval_boolexp // (* ****** ****** *) // implement eval_intexp (e0) = ( // case+ e0 of | Int (i) => i | Neg (e) => ~eval(e) | Add (e1, e2) => eval(e1) + eval(e2) | Sub (e1, e2) => eval(e1) - eval(e2) | IfThenElse (e_test, e_then, e_else) => if eval(e_test) then eval(e_then) else eval(e_else) // ) (* end of [eval_intexp] *) // implement eval_boolexp (e0) = ( // case+ e0 of | Bool (b) => b | Not (e) => ~eval(e) | Less (e1, e2) => eval(e1) < eval(e2) | LessEq (e1, e2) => eval(e1) <= eval(e2) | Conj (e1, e2) => eval(e1) && eval(e2) | Disj (e1, e2) => eval(e1) || eval(e2) // ) (* end of [eval_boolexp] *) // (* ****** ****** *)

The datatypes presented in this chapter are simple datatypes. Other more advanced forms of datatypes include polymorphic datatypes, dependent datatypes, and linear datatypes, which will be covered elsewhere.

Please find on-line the entirety of the code used in this chapter plus some code for testing.