MXPOST is a JAVA (JDK 1.1) implementation of the part-of-speech tagger described in:

Adwait Ratnaparkhi. A Maximum Entropy Part-Of-Speech Tagger. In Proceedings of the Empirical Methods in Natural Language Processing Conference, May 17-18, 1996. University of Pennsylvania

USERS MUST ABIDE BY THE LICENSE INCLUDED WITH THIS DISTRIBUTION.

MXPOST is copyright (c) 1997 Adwait Ratnaparkhi

INSTRUCTIONS FOR USE

To use:

  1. Edit your CLASSPATH variable to include the file mxpost.jar.
  2. Type:

    mxpost projectdir< wordfile

    where projectdir is a project directory, and wordfile contains one sentence per line.

    The project directory tagger.project contains a model trained from sections 0 through 18 of the Penn Treebank Wall St. Journal corpus.

    The sentences in wordfile must be tokenized according to Penn Treebank conventions, e.g., "The stock didn't rise $5." should be "The stock did n't rise $ 5 .

To train a new model:

  1. Edit your CLASSPATH variable to include the directory mxpost.jar.
  2. Create an empty project directory
  3. Type:

    trainmxpost projectdir traindata

    where projectdir is the newly created project directory, and where traindata contains one sentence per line, where each sentence has the format:

    word1_tag1 word2_tag2 word3_tag3 ... word4_tag4