Computer Sciences Dept.

Automatic Novel Writing: A Status Report

Sheldon Klein, John F. Aeschlimann, David F. Balsiger, Steven L. Converse, Claudine Court, Mark Foster, Robin Lao, John D. Oakley, Joel Smith

Programmed in FORTRAN V on a Univac 1108, the system generates 2100 word murder mystery stories, complete with semantic deep structure, in less than 19 seconds. The techniques draw upon the state of the art in linguistics, compiler theory, and micro-simulation. The plot and detailed development of events in the narrative are generated by a micro-simulation model written in a specially created, compiler-driven simulation language. The rules of a simulation model are stochastic (with controllable degrees of randomness) and govern the behavior of individual characters and events in the modelled universe of the story. This universe is represented in the form of a semantic deep structure encoded in the form of a network--a directed graph with labelled edges, where the nodes are semantic objects, and where the labelled edges are relations uniting those objects. The simulation model rules implement changing events in the story by altering the semantic network. Compiler or translator-like production rules are used to generate English narrative discourse from the semantic deep structure network (the output might be in any language). The flow of the narrative is derived from reports on the changing state of the modelled universe as affected bv the simulation rules. Nodes of the semantic network may be atoms, classes, or complex predicates that represent entire subportions of the network. Atom nodes and relations are linked to expression lists that may contain lexical stems or roots that are available for insertion into trees during the generation process. (Low level transformations convert the roots into appropriately inflected or derived forms. High level transformations mark the tree for application of the low level ones.) These expression 1ists may also contain semantic network expressions consisting of objects and relations which may themselves be 1inked to expression lists, thereby providing the generator with recursive expository power. An atom node may also function as a complex predicate node with status that may vary during a simulation. Class nodes may refer to lssts of object nodes, and the complex-predicate nodes can be linked to pointers to sub-portions of the network that includes themselves, allowing them to be recursively self-referential. (This would permit generation of sentences such as "I know that I know that - (sentence)" ). We are also testing a natural-language meta-compiling capability--the use of the semantic network to generate productions in the simulation language itself that may themselves be compiled as new rules during the flow of the simulation. Such a feature will permit one character to transmit new rules of behavior to another character through conversation, or permit a character to develop new behavior patterns as a function of his experiences during the course of a simulation. This feature, combined with the complex-predicate nodes helps to give the system the logical power of at least the 2nd order predicate calculus. Theoretical motivations include an interest in modelling generative-semantic linguistic theories, including case grammar and presuppositional formulations. The dynamic time dimension added to the semantic deep structure by the simulation makes it possible to Formulate more powerful versions of such theories than now exist.

Download this report (PDF)

Return to tech report index

Computer Science | UW Home