MXPOST is a JAVA (JDK 1.1) implementation of the part-of-speech tagger described in:
USERS MUST ABIDE BY THE LICENSE INCLUDED WITH THIS DISTRIBUTION.
MXPOST is copyright (c) 1997 Adwait Ratnaparkhi
To use:
mxpost projectdir< wordfile
where projectdir
is a project directory, and
wordfile
contains one sentence per line.
The project directory tagger.project
contains a model trained
from sections 0 through 18 of the Penn Treebank Wall St. Journal
corpus.
The sentences in wordfile
must be tokenized according to Penn Treebank conventions,
e.g., "The stock didn't rise $5." should be "The stock did n't rise $ 5 .
To train a new model:
trainmxpost projectdir traindata
where projectdir
is the newly created project directory,
and where traindata
contains one sentence per line, where
each sentence has the format:
word1_tag1 word2_tag2 word3_tag3 ... word4_tag4