MXPOST

MXPOST is a JAVA (JDK 1.1) implementation of the part-of-speech tagger described in:

Adwait Ratnaparkhi. A Maximum Entropy Part-Of-Speech Tagger. In Proceedings of the Empirical Methods in Natural Language Processing Conference, May 17-18, 1996. University of Pennsylvania (it is no longer available at the original location http://www.cis.upenn.edu/~adwait/papers/tagger.ps, I found it at http://citeseer.ist.psu.edu/152720.html). Adwait's thesis and two FAQs

USERS MUST ABIDE BY THE LICENSE INCLUDED WITH THIS DISTRIBUTION.

MXPOST is copyright (c) 1997 Adwait Ratnaparkhi

INSTRUCTIONS FOR USE

To use:

  1. Type: mxpost projectdir < wordfile

    where projectdir contains the files constituting the model and wordfile contains one sentence per line.

    An example of a "project directory" is /group/contrib/nlp-speech/src/mxpost/tagger.project , it contains a model trained from sections 0 through 18 of the Penn Treebank Wall St. Journal corpus.

    The sentences in wordfile must be tokenized according to Penn Treebank conventions, e.g., "The stock didn't rise $5." should be "The stock did n't rise $ 5 .

    You may want to use the script Treebank_tokenization.sed for that, which is at the same location as mxpost.

To train a new model:

  1. Edit your CLASSPATH variable to include the directory mxpost.jar (as in script "mxpost").
  2. Create an empty project directory
  3. Type:

    trainmxpost projectdir traindata

    where projectdir is the newly created project directory, and where traindata contains one sentence per line, where each sentence has the format:

    word1_tag1 word2_tag2 word3_tag3 ... word4_tag4


Home : Resources : Nlp 

Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail: school-office@inf.ed.ac.uk
Please contact our webadmin with any comments or corrections.
Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh