next up previous contents
Next: Building and Running Up: Getting Started with Shore Previous: Implementing the Operations

 

The Main Program

The main program of our sample application is in main.C. Most of the code should be clear to any experienced C++ programmer. We will only concentrate on those statements that exercise Shore features.

 

Initialization

Any program that interacts with Shore must call the static member function Shore::init exactly once before doing any Shore operations, to initialize the client-side machinery. It searches the command line (supplied by the first two arguments, which are usually the same as the first two arguments to main) for options specifically meaningful Shore and removes them from argc and argv. The forth argument to Shore::init is the name of an options file, which can supply parameters, such as the size of the object cache. It is a good idea to get the name of this file from the environment, as indicated here, rather than wiring into the program. If there is no value for STREE_RC specified in the environment, the standard Unix library function getenv will return 0, and a null fourth argument tells Shore::init to use reasonable defaults. The third argument to Shore::init is the application name, which is used to look up options in the option file. A null argument (as shown here) tells Shore::init to use argv[0]. For more details, consult the init(oc) manual page.

Like many Shore interface functions, Shore::init returns a value of type shrc ("rc" stands for "return code").

The macro SH_DO is handy for calling functions that are not expected to fail. It evaluates its argument and verifies that the result is RCOK. If not, it prints (on cerr) an error message and aborts the program. For more details about errors, consult the errors(oc) manual page. SH_DO is described on the transaction(oc) manual page.

 

Transactions

Every Shore operation except Shore::init must be executed inside a transaction. A transaction groups a set of interactions with the database into a single atomic unit. Shore ensures that transactions running concurrently by multiple programs have a net effect that is equivalent to running them one at a time. (This property is called "serializability"). Moreover, if a transaction should fail, Shore guarantees that all changes to the database performed by the transaction are undone. A program starts a transaction by invoking the macro SH_BEGIN_TRANSACTION. Its argument is a variable of type shrc. When the program has successfully completed all the actions in a transaction, it invokes the parameterless macro SH_COMMIT_TRANSACTION to make all of its changes to the database permanent and to unlock any database objects that Shore may have locked to ensure serializability. In exceptional circumstances, Shore may reject the attempt to commit the transaction. Therefore, SH_COMMIT_TRANSACTION returns an shrc value. Since we do not want to try any fancy recovery actions if SH_COMMIT_TRANSACTION fails in our application, we invoke it with SH_DO.

On occasion, an application program may discover that a transaction cannot be completed for application-specific reasons. In such occasions, the program can explicitly abort the transaction by calling the macro SH_ABORT_TRANSACTION(rc), where rc is a value of type shrc. In addition to requesting Shore to undo all changes to persistent data and release all locks, this macro performs a longjmp, returning control to the statement following the most recently executed SH_BEGIN_TRANSACTION and assigning the shrc value supplied to SH_ABORT_TRANSACTION to the result parameter of SH_BEGIN_TRANSACTION. Since any transaction may be aborted in this manner, each call to SH_BEGIN_TRANSACTION should be followed by code that tests the resulting shrc and takes corrective action if it is not RCOK. The member function shrc::fatal prints an appropriate message and aborts the program. The macro SH_DO previously described behaves somewhat differently if a transaction is active and an error is detected: Instead of terminating the program, it invokes SH_ABORT_TRANSACTION. For more details about errors, consult the transaction(oc) manual page.

 

Registered Objects

After beginning a transaction, our example program calls Shore::chdir. to go to directory stree. Shore::chdir is similar to the chdir system call of Unix: It alters the current Shore working directory. Note that the program has two "current working directories": one that is applies to Unix system calls and one that applies to all path names used in calls to Shore.

Like the Unix system call, Shore::chdir will fail if the the directory does not exist. In the case of our example program, Shore::chdir fails for this reason the first time the program is run. It recovers by creating the directory (using Shore::mkdir, which is similar to the Unix function of that name), and reissues the chdir request). A failure of the first chdir operation for any other reason is a catastrophic error. Thus the program checks that the return code is either RCOK or SH_NotFound and aborts the transaction otherwise.

When run for the first time, our example program also creates two "registered" objects: an instance of SearchTree registered under the path name stree/repository and a pool named stree/pool. The pool is used later to allocate "anonymous" instances of Word and Cite. See An Overview of Shore for more information about registered and anonymous objects and pools. The SearchTree object is created by a form of the C++ new operator applied to the class name SearchTree using C++ "placement syntax" to supply the path name and permission bits for the new object. If the operation should fail for any reason (such as permission denied), it will cause an abort of the current transaction, returning control to the statement following the most recent SH_BEGIN_TRANSACTION. Otherwise, a reference to the new object is assigned to the global variable repository.

The creation of the pool nodes. illustrates an alternative way of creating a registered object. The variable nodes is declared to have type Ref<Pool>, where Pool is a pre-defined Shore type. The class Ref<T>, for any T, has several static member functions, such as create_registered, create_anonymous, and create_pool. Each one has parameters to supply a path name and protection mode, as well as a result parameter to receive a reference to the created object. In this case we call nodes.create_pool to create a new Pool object. (We could have written equivalently Ref<Pool>::create_pool). Each of these functions returns an shrc result to indicate success or failure.

The differing failure modes of these two ways of creating registered objects illustrate a general design principle of Shore. Many Shore operations are invoked implicitly. Another example of an implicit operation is dereferencing a Ref<T> value. When any of them fails (for example, if the reference is null or dangling), Shore responds by aborting the current transaction. If the program needs more precise control--in particular, if it wants a chance to recover from the error--it must use an alternative interface by explicitly calling a Shore function that yields a return code.

If the directory stree already exists, the program expects to find existing registered objects stree/repository and stree/pool. Each reference class Ref<T> has a static member function lookup, with a path-name input parameter and an output parameter of type Ref<T>. This function looks for a registered object with the given name, and if one is found, checks that its type (as indicated by data stored in the database) matches T. If both checks succeed, a reference to the object is returned in the result parameter. The initializations of repository and nodes illustrate two ways of invoking this function.

Finally, the main program performs one of four operations depending on a command-line switch. It either adds one or more documents to the repository, looks up a word, removes a document from the repository, or dumps all anonymous objects.

 

Anonymous Objects

In a typical Shore application, the vast majority of objects created will not have path names. Unlike registered objects, which can be accessed either by path name or by references from other objects, these "anonymous" objects can only be accessed by following references. To assist in clustering, and to allow the application (and system administrators) to keep track of all allocated space, Shore requires each anonymous object to be allocated from a pool, which is a registered object. Our example program uses just one pool for all anonymous objects. A more sophisticated program might use a separate pool for each type extent, or for each major component of a complex data structure.

The function SearchTree::insert(char *fname) in tree.C shows how to create anonymous objects. The expression "new (nodes) Cite" allocates a new instance of interface Cite from the pool referenced by nodes. The function Document::finalize(char *fname) in document.C shows how to destroy an anonymous object: If p is a reference (an instance of Ref<T>, for some type T), p.destroy() destroys the object referenced by p. Registered objects cannot be explicitly destroyed; like Unix files, they are deleted by the system when they have no path names designating them. An example of code to delete a registered object may be found in the function delete_file in main.C The call Shore::unlink(fname) removes the name fname from a registered object. Since this object will not have any aliases ("hard" links, which are created by Shore::link), unlinking it will cause it to be destroyed.

 

Relationships

The definitions in stree.sdl include two bidirectional relationships. One links words to their citations and the other links citations to the documents they cite. A bidirectional relationship has two names, one for each direction. For example, the relationship between citations and documents is called "doc" in the Cite-to-Document direction and "cited_by" in the reverse direction. This relationship is declared by the declaration

    relationship ref<Document> doc inverse cited_by;
in interface Cite, and by the declaration
    relationship set<Cite> cited_by inverse doc;
in interface Document. The SDL compiler checks that the two declarations are consistent. The use of "ref" rather than "set" in the first of these declarations indicates a functional dependency from Cite to Document: Each Cite is related to at most one Document.

In the C++ binding, these declarations give rise to data members Cite::doc, of type Ref<Document> and Document::cited_by, of type Set<Cite>. Similarly, the relationship between words and citations is represented by Word::cited_by and Cite::cites.

The type Set<T> represents a set of zero or more references to distinct T objects. It has member functions to add and delete values of type Ref<Cite> and to iterate through its contents. The details of the interface, which are documented in the set(cxxlb) manual page, are likely to change in future releases of Shore. An example of the use of the current interface may be seen in word.C. In Word::occurs_on, a citation of word w is recorded by adding the reference cite to w.cited_by. The runtime support automatically adds a reference to w to cite->cites. The function Word::occurrence uses the member function Set<Cite>::get_elt to retrieve (a reference to) one of the citations of a word, while Word::count uses Set<Cite>::get_size to determine how many citations there are. A reference can be deleted from a set with Set<T>::del. Document::finalize uses an alternative interface: The function Set<T>::delete_one returns a one of the references, deleting it from the set. (The implementation chooses an arbitrary reference to return; it retuns NULL if the set is empty).

The function Document::finalize is called just before destroying a Document object. Although the runtime system automatically updates one end of an bidirectional relationship when the other end is updated by assignment, it does not (yet) update inverse relationships properly when an object is destroyed. (This is a bug; it will be fixed in a future release). However, even if it did so, there might be application-specific cleanup operations required. In our example program, we would like to "garbage collect" the Cite objects associated with the document being removed. Document::finalize iterates through the citations of the document, invoking Cite::finalize on each one and then destroying it. Cite::finalize simply removes all references from the citation to Word objects, thereby removing the citation from the cited_by set of each word. The example program does not remove words from the binary search tree when their citation counts drop to zero.   Adding code to do so would not be hard, but it would not illustrate any additional features of Shore.

 

Strings and Text

The pre-defined type string is implemented as a char * pointer and a length (so strings can contain null bytes). When a persistent object containing strings is written to disk, the actual string data is appended to the object and the pointers are converted to a form appropriate for storage on disk. When it is brought back into memory, the pointers are restored ("swizzled") to memory addresses. When an ordinary C++ (null-terminated) string is assigned to a Shore string, the bytes (up to and including the terminating null byte) are copied to dynamically allocated space. See for example, Word::initialize. When an object containing strings is removed from the object cache, its string space is freed. Thus Shore strings have value semantics.

Standard library string functions such as strcmp, strncmp, strlen, etc., as well as memcpy and bcopy are overloaded to work with Shore strings. In addition to strlen, strings support an operation blen which returns the total length (including null bytes). It is also possible to assign a character or string to an arbitrary offset in a Shore string. The target string is expanded if necessary to accommodate the data. For example, Document::append extends the body field of a document by invoking the sdl_string::set function (Document::body is actually of type text, but text and string are the same for the purposes of this discussion). See the string(cxxlb) manual page for more details.

 

Scanning Pools

The example program supports an option (-p) for dumping all the anonymous objects in the pool created by the program. This last option is useful for verifying that the object deletion code is working correctly, and illustrates how one might write administrative programs for maintaining a complex database. pool_list in main.C creates a PoolScan object to scan the contents of the pool, and tests whether the creation was successful. (If for example, the named pool did not exist or permission was denied, the scan object would be created in an "invalid" state, and would test as false when converted to Boolean.) The function PoolScan::next returns a reference to the "next" object in the pool (according to some arbitrary ordering) in its result parameter. It returns some shrc value other than RCOK when no more objects remain. The result parameter must be of type Ref<any>, the persistent analogue of void * (a reference to an object of unknown type). The actual type of object can be tested dynamically with the function TYPE(T)::isa(Ref<any> &ref). Each interface T defined in an SDL definition gives rise to a type object (or meta-type), which is available as a global variable named TYPE_OBJECT(T) (of type TYPE(T)). One of the member functions of this object is isa, which accepts a parameter of type Ref<any>, tests whether it is a reference to an object of type T, and if so returns a reference of type Ref<T> to it. Otherwise, isa returns a null reference. It should be noted that this interface for dynamic type checking is provisional; it may be replaced with a facility more nearly resembling the dynamic_cast syntax for run-time type identification (RTTI) recently added to the proposed C++ standard.

After checking that the type of returned object conforms to one of the expected types (the program only creates anonymous objects of type Word and Cite), pool_list uses the reference (as converted by isa) to call the appropriate print function (Word::print or Cite::print).



next up previous contents
Next: Building and Running Up: Getting Started with Shore Previous: Implementing the Operations



Marvin Solomon
Fri Aug 2 13:40:31 CDT 1996