Example: Functional Red-Black Trees

A red-black tree is defined as a binary tree such that each node in it is colored red or black and every path from the root to a leaf has the same number of black nodes while containing no occurrences of two red nodes in a row. Clearly, the length of a longest path in each red-black tree is bounded by 2 times the length of a shortest path in it. Therefore, red-black trees are a family of balanced trees. The number of black nodes occurring on each path in a red-black tree is often referred to as the black height of the tree.

Formally, a datatype precisely for red-black trees can be declared in ATS as follows:

// #define BLK 0 #define RED 1 sortdef clr = {c:nat | c <= 1} // datatype rbtree ( a:t@ype+ , int(*color*) , int(*bheight*) ) = | rbtree_nil(a, BLK, 0) | {c,cl,cr:clr}{bh:nat} {cl <= 1-c; cr <= 1-c} rbtree_cons(a, c, bh+1-c) of (int(c), rbtree(a, cl, bh), a, rbtree(a, cr, bh)) // end of [rbtree] //

The color of a tree is the color of its root node or is black if the tree is empty. Given a type T, a color C (represented by a integer) and an integer BH, the type rbtree(T, C, BH) is for red-black trees carrying elements of the type T that is of the color C and the black height BH.

When implementing various operations (such as insertion and deletion) on a red-black tree, we often need to first construct intermediate trees that contain color violations caused by a red node being followed by another red node and then employ tree rotations to fix such violations. This need makes the above datatype rbtree too rigid as it cannot be assigned to any intermediate trees containing color violations. To address this issue, we can declare rbtree as follows:

// datatype rbtree ( a:t@ype+ , int // color , int // bheight , int // violations ) = // rbtree | rbtree_nil(a, BLK, 0, 0) of () | {c,cl,cr:clr}{bh:nat}{v:int} {c==BLK && v==0 || c == RED && v==cl+cr} rbtree_cons(a, c, bh+1-c, v) of ( int c, rbtree0 (a, cl, bh), a, rbtree0(a, cr, bh) ) (* end of [rbtree_cons] *) // end of [rbtree] // where rbtree0(a:t@ype, c:int, bh:int) = rbtree(a, c, bh, 0) //

We count each occurrence of two red nodes in a row as one color violation. Given a type T, a color C (represented by a integer), an integer BH and an integer V, the type rbtree(T, C, BH, V) is for trees carrying elements of the type T that is of the color C and the black height BH and contains exactly V color violations. Therefore, the type rbtree(T, C, BH, 0) is for valid red-black trees (containing no color violations).

Given a tree containing at most one color violation, an element and another tree containing no violations, the following operation constructs a valid red-black tree:

fn{ a:t@ype } insfix_l // right rotation for fixing left insertion {cl,cr:clr} {bh:nat} {v:nat} ( tl: rbtree (a, cl, bh, v), x0: a, tr: rbtree (a, cr, bh, 0) ) : [c:clr] rbtree0 (a, c, bh+1) = let #define B BLK; #define R RED; #define cons rbtree_cons in case+ (tl, x0, tr) of | (cons(R, cons(R, a, x, b), y, c), z, d) => cons(R, cons(B, a, x, b), y, cons(B, c, z, d)) // shallow rot | (cons(R, a, x, cons(R, b, y, c)), z, d) => cons(R, cons(B, a, x, b), y, cons(B, c, z, d)) // deep rotation | (a, x, b) =>> cons(B, a, x, b) end // end of [insfix_l]

By simply reading the interface of insfix_l, we can see that the two tree arguments are required to be of the same black height bh for some natural number bh and the returned tree is of the black height bh+1.

The following operation insfix_r is just the mirror image of insfix_l:

fn{ a:t@ype } insfix_r // left rotation for fixing right insertion {cl,cr:clr} {bh:nat} {v:nat} ( tl: rbtree (a, cl, bh, 0), x0: a, tr: rbtree (a, cr, bh, v) ) : [c:clr] rbtree0 (a, c, bh+1) = let #define B BLK; #define R RED; #define cons rbtree_cons in case+ (tl, x0, tr) of | (a, x, cons(R, b, y, cons(R, c, z, d))) => cons(R, cons(B, a, x, b), y, cons(B, c, z, d)) // shallow rot | (a, x, cons(R, cons(R, b, y, c), z, d)) => cons(R, cons(B, a, x, b), y, cons(B, c, z, d)) // deep rotation | (a, x, b) =>> cons(B, a, x, b) end // end of [insfix_r]

The preparation for implementing insertion on a red-black tree is all done by now, and we are ready to see an implementation of insertion guaranteeing that the tree obtained from inserting an element into a given red-black tree is always a valid red-black tree itself. This guarantee is precisely captured in the following interface for insertion:

extern fun{ a:t@ype } rbtree_insert {c:clr}{bh:nat} ( t: rbtree0 (a, c, bh), x0: a, cmp: cmp(a) ) : [bh1:nat] rbtree0 (a, BLK, bh1)

Interestingly, this interface also implies that the tree returned by a call to rbtree_insert is always black. The code presented below gives an implementation of rbtree_insert:

implement{a} rbtree_insert (t, x0, cmp) = let // #define B BLK #define R RED #define nil rbtree_nil #define cons rbtree_cons // fun ins {c:clr}{bh:nat} .<bh,c>. ( t: rbtree0 (a, c, bh), x0: a ) : [ cl:clr;v:nat | v <= c ] rbtree (a, cl, bh, v) = ( case+ t of | nil((*void*)) => cons{..}{..}{..}{0}(R, nil, x0, nil) | cons(c, tl, x, tr) => let val sgn = compare(x0, x, cmp) in // // [ifcase] for combining a sequence // of chaining if-then-else expressions // ifcase | sgn < 0 => let val [cl:int,v:int] tl = ins(tl, x0) in if c = B then insfix_l(tl, x, tr) else cons{..}{..}{..}{cl}(R, tl, x, tr) // end of [if] end (* sgn < 0 *) | sgn > 0 => let val [cr:int,v:int] tr = ins(tr, x0) in if c = B then insfix_r(tl, x, tr) else cons{..}{..}{..}{cr}(R, tl, x, tr) // end of [if] end (* sgn > 0 *) | _(* sgn = 0 *) => (t) // end of [ifcase] end // end of [cons] ) (* end of [ins] *) // val t = ins(t, x0) // in case+ t of cons(R, tl, x, tr) => cons(B, tl, x, tr) | _ =>> t end // end of [rbtree_insert]

Note that the type assigned to the inner function ins is so informative that it literally gives an formal explanation about the way in which insertion works on a red-black tree. Many a programmer implements red-black trees by simply following an algorithm written in some format of pseudo code while having little understanding about the innerworkings of the algorithm. For instance, if the above inner function ins is implemented in C, few programmers are likely to see that the function always maintain the black height of a given red-black tree after insertion but may introduce one color violation if the root of the tree is red. On the other hand, knowing this invariant is essential to gaining a thorough understanding of the insertion algorithm on red-black trees.

The insertion operation implemented above does not insert an element if it is already in the given red-black tree. It may be desirable to require that the operation inform the caller if such a case occurs. For instance, an exception can be raised for this purpose. An alternative is to give rbtree_insert a call-by-reference argument so that a flag can be returned in it to indicate whether the element to be inserted is actually inserted. I will explain elsewhere what call-by-reference is and how it is supported in ATS.

Often deleting an element from a binary search tree is significantly more difficult to implement than inserting one. This is especially so in the case of a red-black tree. I refer the interested reader to the libats library of ATS for some code implementing a deletion operation on red-black trees that can guarantee based on types each tree returned by the operation being a valid red-black tree (containing no color violations).

Please find the entire code in this section plus some additional code for testing on-line.