MontyLingua (version 2.1) | index c:\work\montylingua-2.0\python\montylingua.py |
Module MontyLingua
MONTY LINGUA - An end-to-end natural language processor
for English, for the Python/Java platform
Author: Hugo Liu <hugo@media.mit.edu>
Project Page: <http://web.media.mit.edu/~hugo/montylingua>
Copyright (c) 2002-2004 by Hugo Liu, MIT Media Lab
All rights reserved.
Non-commercial use is free, as provided in the GNU GPL
By downloading and using MontyLingua, you agree to abide
by the additional copyright and licensing information in
"license.txt", included in this distribution
If you use this software in your research, please
acknowledge MontyLingua and its author, and link to back
to the project page http://web.media.mit.edu/~hugo/montylingua.
Please cite montylingua in academic publications as:
Liu, Hugo (2004). MontyLingua: An end-to-end natural
language processor with common sense. Available
at: web.media.mit.edu/~hugo/montylingua.
************************************************
DOCUMENTATION OVERVIEW
About MontyLingua:
- MontyTokenizer
- normalizes punctuation, spacing and
contractions, with sensitivity to abbrevs.
- MontyTagger
- Part-of-speech tagging using PENN TREEBANK tagset
- enriched with "Common Sense" from the Open Mind
Common Sense project
- exceeds accuracy of Brill94 tbl tagger
using default training files
- MontyREChunker
- chunks tagged text into verb, noun, and adjective
chunks (VX,NX, and AX respectively)
- incredible speed and accuracy improvement over
previous MontyChunker
- MontyExtractor
- extracts verb-argument structures, phrases, and
other semantically valuable information
from sentences and returns sentences as "digests"
- MontyLemmatiser
- part-of-speech sensitive lemmatisation
- strips plurals (geese-->goose) and
tense (were-->be, had-->have)
- includes regexps from Humphreys and Carroll's
morph.lex, and UPENN's XTAG corpus
- MontyNLGenerator
- generates summaries
- generates surface form sentences
- determines and numbers NPs and tenses verbs
- accounts for sentence_type
WHERE MUST THE DATAFILES BE?
- the "datafiles" include all files ending in *.MDF
- the best solution is to create an environment variable called
"MONTYLINGUA" and put the path to the datafiles there
- alternatively, MontyLingua can find the datafiles if they are
in the operating system "PATH" variable, or in the current
working directory
API:
The MontyLingua Python API is MontyLingua.html
The MontyLingua Java API is JMontyLingua.html
RUNNING:
MontyLingua can be called from Python, Java,
or run at the command line.
A. From Python, import the MontyLingua.py file
B. From your Java code:
1. make sure "montylingua.jar" is
in your class path, in addition to
associated subdirectories and data files
2. in your code, you need something like:
import montylingua.JMontyLingua; // loads namespace
public class YourClassHere {
public static JMontyLingua j = new JMontyLingua();
public yourFunction(String raw, String toked) {
jisted = j.jist_predicates(raw); // an example function
3. For a good use case example, see Sample.java.
C. From the command line:
1. if you have python installed and in your path:
type "run.bat"
2. if you have java installed and in your path:
type "runJavaCommandline.bat"
VERSION HISTORY:
New in version 2.1 (6 Aug 2004)
- new MontyNLGenerator component (in Beta phase)
- includes version 2.0.1 bugfix for problem
where java api wasn't being exposed
New in version 2.0 (29 Jul 2004)
- 2.5X speed enhancement for whole system
2X speed enhancement for tagger component
- rule-based chunker replaced with much faster
and more accurate regular expression chunker
- common sense added to MontyTagger component
improves word-level tagger accuracy to 97%
- updated and expanded lexicon for English
- added a user-customizable lexicon
CUSTOMLEXICON.MDF
- improvements to MontyLemmatiser incorporating
exception cases
- html documentation added
- speed optimizations to all code
- improvements made to semantic extraction
- added a morphological analyzer component,
MontyMorph
- expanded Java API
New in version 1.3.1 (11 Nov 2003)
- mainly bugfixes
- datafiles can now sit in the current working directory (".")
or in the path of either of the two environment variables
"MONTYLINGUA" or "PATH"
- presence of the '/' token in input won't crash system
New in Version 1.3 (5 Nov 2003)
- lisp-style predicate output added
- Sample.java example file added to illustrate API
New in Version 1.2 (12 Sep 2003)
- MontyChunker rules expanded
- MontyLingua JAVA API added
- MontyLingua documentation added
New in Version 1.1 (1 Sep 2003)
- MontyTagger optimized, 2X loading and 2.5X tagging speed
- MontyLemmatiser added to MontyLingua suite
- MontyChunker added
- MontyLingua command-line capability added
New in Version 1.0 (3 Aug 2003)
- First release
- MontyTagger (since 15 Jan 2001) added to MontyLingua
--please send bugs & suggestions to hugo@media.mit.edu--
Modules | ||||||
|
Classes | ||||||||
|
Data | ||
__author__ = 'Hugo Liu <hugo@media.mit.edu>' __version__ = '2.1' |
Author | ||
Hugo Liu <hugo@media.mit.edu> |