Chapter 20. Persistent Hashtables

Hashtables are commonly used to implement finite maps. In ATSLIB/libats, there are hashtable implementations based on linear chaining and linear probing. There is also support for linear hashtables as well as persistent hashtables. The linear ones can be safely freed by the programmer, and the persistent ones (which are directly based on linear ones) can only be safely reclaimed through garbage collection (GC). In this chapter, I show how persistent hashtables can be created and operated on.

Suppose that a map is needed for mapping keys of type key_t to items of type itm_t. The following code essentially sets up an interface for creating and operating on such a map based on a hashtable implementation in ATSLIB/libats:

local typedef key = key_t and itm = itm_t in (* in-of-local *) #include "libats/ML/HATS/myhashtblref.hats" end // end of [local]

Please find on-line the HATS file mentioned in the code, which is just a convenience wrapper made to simplify programming with hashtables.

Assume that key_t is string and itm_t is int. The following line of code creates a hashtable with its initial capacity set to be 1000:

val mymap = myhashtbl_make_nil(1000)

Note that the capacity in this case is the size of the array associated with the created hashtable. The underlying hashtable implementation is based on linear chaining, and this hashtable can store up to 5000 (5*1000) items without need for resizing. When resizing is indeed needed, it is done automatically. The following few lines insert some key/item pairs into mymap:

// val-~None_vt() = mymap.insert("a", 0) val-~Some_vt(0) = mymap.insert("a", 1) // val-~None_vt() = mymap.insert("b", 1) val-~Some_vt(1) = mymap.insert("b", 2) // val-~None_vt() = mymap.insert("c", 2) val-~Some_vt(2) = mymap.insert("c", 3) //

The dot-symbol .insert is overloaded with a function of the name myhashtbl_insert. Given a key and an item, mymap.insert inserts the key/item pair into mymap. If the key is in the domain of the map represented by mymap before insertion, then the original item associated with the key is returned. Otherwise, no item is returned. As can be expected, the size of mymap is 3 at this point:

val () = assertloc (mymap.size() = 3)

The dot-symbol .size is overloaded with a function of the name myhashtbl_get_size, which returns the number of key/item pairs stored in a given hashtable. During the course of debugging, one may want to print out the key/item pairs in a given hashtable:

// val () = fprintln! (stdout_ref, "mymap = ", mymap) //

where the symbol fprint is overloaded with fprint_myhashtbl. The next two lines of code show how search with a given key operates on a hashtable:

val-~None_vt() = mymap.search("") val-~Some_vt(1) = mymap.search("a")

The dot-symbol .search is overloaded with a function of the name myhashtbl_search, which returns the item associated with a given key if it is found. The next few lines of code remove some key/item pairs from mymap:

// val-true = mymap.remove("a") val-false = mymap.remove("a") // val-~Some_vt(2) = mymap.takeout("b") val-~Some_vt(3) = mymap.takeout("c") //

The dot-symbol .remove is overloaded with a function of the name myhashtbl_remove for removing a key/item pair of a given key. If a key/item pair is removed, then the function returns true. Otherwise, it returns false to indicates that no key/item pair of the given key is stored in the hashtable being operated on. The dot-symbol .takeout is overloaded with a function of the name myhashtbl_takeout, which is similar to myhashtbl_remove excepting for returning the removed item. The next few lines of code make use of several less commonly used functions on hashtables:

// val () = mymap.insert_any("a", 0) val () = mymap.insert_any("b", 1) val () = mymap.insert_any("c", 2) val kxs = mymap.listize1((*void*)) val ((*void*)) = fprintln! (stdout_ref, "kxs = ", kxs) val kxs = mymap.takeout_all((*void*)) val ((*void*)) = fprintln! (stdout_ref, "kxs = ", kxs) // val () = assertloc (mymap.size() = 0) //

The dot-symbol .insert_any is overloaded with a function of the name myhashtbl_insert_any, which always inserts a given key/item pair regardless whether the key is already in use. One should really avoid using this function or only call it when it is absolutely sure that the given key is not already in use for otherwise the involved hashtable would be corrupted. The dot-symbols .listize1 and .takeout_all are overloaded with two functions of the names myhashtbl_listize1 and myhashtbl_takeout_all, respectively. Both of them return a list consisting of all the key/item pairs in a given hashtable; the former keeps the hashtable unchanged while the latter empties it. Last, I present as follows the interface for an iterator going over all the key/item pairs in a given hashtable:

// extern fun myhashtbl_foreach_cloref ( tbl: myhashtbl , fwork: (key, &(itm) >> _) -<cloref1> void ) : void // end-of-function //

As an example, the following code prints out all the key/item pairs in a given hashtable:

// val () = myhashtbl_foreach_cloref ( mymap , lam (k, x) => fprintln! (stdout_ref, "k=", k, " and ", "x=", x) ) (* myhashtbl_foreach_cloref *) //

Please find the entirety of the code used in this chapter on-line. Also, there is a hashtable-based implementation of symbol table available on-line.