Computer Sciences Dept.

Learning to Use Contextual Patterns in Language Processing

Sara R. Jordan

The research described in this thesis concerns the application of pattern recognition and learning to natural language processing. Using the techniques of learning and pattern recognition, a program has been written to learn and to demonstrate its knowledge of natural language by trying to transform language strings from one form to another, and to answer questions based on the information it has learned. In a departure from the usual approach to question answering and other natural language processing, this program avoids using built-in linguistic or logical information or techniques. In addition, several types of language behavior are attempted in the same single program, including transformation or translation of language strings, information learning and organization, and question answering. Emphasis is given to this program's ability to learn a memory net structure and to categorize the nodes in it into general behavior classes. All inputs are in the form of unstructured, unsegmented strings of natural language. In a general and uniform way, the program processes these strings and incorporates its knowledge into a net structure which acts as its permanent memory. Learned language units are interrelated and organized in the net by a general process of categorizing them into classes according to feedback as to "correct" usage received interactively from a human trainer. Weights are used for 1earning and unlearning relationships. Using only the information which it has thus learned and represented in the memory net, the program accepts natural language input strings, processes each string according to the requested task, and outputs in natural language a response, which may be a)a translation or other transformation of the input string, or b)an answer to a question input by the human. The purpose of this research is not to try to produce high quality language tanslation and question answering; rather, it is to experiment with a memory structure which, with the aid of a set of simple and general heuristics, demonstrates an interesting kind of learning for natural language manipulation. The program is currently running interactively on the Univac 1108 (Exec 8) Timesharing System at the University of Wisconsin. Coded in Fortran V, the program includes a string-matching list processing language written by the author especially for this research.

Download this report (PDF)

Return to tech report index

Computer Science | UW Home