As mentioned above, a binary search tree is not really an appropriate data structure for persistent data. Shore has a built-in index feature designed just for tasks such as this. The src/examples/stree sub-directory of the distribution contains a version of the search-trees example that uses a Shore index instead of a search tree. Instead of stree.sdl we use doc_index.sdl. The source file stree_defs.C is replaced by doc_index_defs.C, tree.C is replaced by docIndex.C, and main.C, word.C, cite.C, and document.C are replaced by ix_main.C, ix_word.C, ix_cite.C, and ix_document.C, respectively. Since most of the program is unchanged, we shall only consider the differences here.
First consider doc_index.sdl. We have removed from interface Word the attributes left and right and operations find_or_add and find and added a "destructor" named finalize. The interface SearchTree has been renamed DocIndex. Its attribute
attribute ref<Word> root;has been replaced by
attribute index<string,Word> ind;which declares an index mapping strings to Word references. We have also added the operation delete_word, which we would have included in SearchTree if we weren't so lazy.
Each of the source files main.C, word.C, cite.C, and document.C has been edited to replace
#include "doc_index.h",replace occurrences of type Ref<SearchTree> with Ref<DocIndex>, etc. The code in main.C for the-p option has been extended to dump the contents of the index. Word.C has been edited to remove all mention of the deleted attributes left and right and operations find_or_add and find, and a new finalize operation has been added, which simply calls DocIndex::delete_word.
In cite.C, the finalize operation checks each Word cited by the Cite being deleted to see if its reference count has dropped to zero. If so, it calls the Word's finalize method and destroys it. Note that the Word is first removed from Cite::cites relationship, which also removes the current Cite from the inverse Word::cited_by relationship, thus decrementing the Word's reference count.
The only change from document.C to ix_document.C the name of the #include file.
Finally, consider the new source file docIndex.C. The index attribute ind of interface DocIndex compiles into a data member whose type is derived from the pre-defined Shore type sdl_index_base. Each such index object must be initialized before it is used. The "constructor" DocIndex::initialize calls sdl_index_base::init, which takes a single argument of type IndexKind-an enumeration type that includes the values BTree and UniqueBtree. (Other values of this enumeration correspond to index types that are not yet supported.) In our case, we choose UniqueBtree, meaning that we want a B-tree that has at most one value for each key. The member function DocIndex::insert uses the member function sdl_index_base::find to determine whether a Word with the given key already exists in the index. If not, one is created and a reference to it is inserted into the index with the member function sdl_index_base::insert. Finally, the Word object that was found or created is updated to add the given Cite to its set of occurrences.
DocIndex::insert_file is identical to SearchTree::insert_file, and DocIndex::find simply calls sdl_index_base::find. DocIndex::delete_word is called to remove reference to a Word from the index when it is no longer referenced. The member function sdl_index_base::remove has an input parameter of the key-type of the index, and an integer output parameter count. It removes all index entries whose keys match the first argument and returns a count of the number of entries removed. In our case, this count should always be one.
DocIndex::print illustrates how to iterate through the entries in an index using the Shore template class IndexScanIter, which has two type parameter corresponding to the types of the keys and values of the index. The types given must match the types of the language binding, which might differ syntactically from the SDL type declarations. (For example, Ref<Word> must be used in place of Word here, similarly SDL string translates to sdl_string in a C++ language binding.)
The iterator's constructor takes an index of the appropriate type (in this case DocIndex::ind) as a parameter. For ordered indices (such as B-trees), it also has two optional arguments to indicate lower and upper bounds. When these values are supplied, the iterator only returns keys in the indicated range. When they are omitted (as in this example) the iterator returns all entries in the index.
An iterator has a function member next to advance to the next entry, a data member eof to indicate if any more entries exist, and data members cur_key and cur_val representing the key and value of the current entry. As this code illustrates, the next function has to be called once at the start to "prime the pump", and eof flag is false after advancing past the last entry. This interface may change in future releases. See the index(cxxlb) manual page for more details.
The script run_tests.sh included in the source distribution runs all of the tests listed above using both the stree and doc_index versions of the program.