TERp Phrase Table Version 0.1 Matthew Snover (snover@umiacs.umd.edu) http://www.cs.umd.edu/~snover 2/2/09 This package contains the paraphrase phrasetable used by TERp in the NIST Metrics MATR 2008 workshop. This data is distributed under the license in LICENSE.TXT. The following files are contained in this distribution: * unfiltered_phrasetable.txt : This is the TERp phrase table, a plain text file containing paraphrases and their probabilities. * LICENSE.TXT : The license under which this data is distributed. * README.txt : This readme file. Each line in the phrase table is formated as follows: PROBABILITY
REF_PHRASE
PARAPHRASE
So for example, the following line from the phrase table: 0.57534313 buildings
13 homes
indicates that the phrase "13 buildings" is paraphrased to "13 homes" with probability 0.575343. The paraphrases used in TERp are extracted using the pivot-based method (Bannard and Callison-Burch, 2005) with several additional filtering mechanisms to increase precision. The corpus used for extraction was an Arabic-English newswire bitext containing approximately 1 million sentences. Citation and description for TER-Plus and this paraphrase phrasetable: Matthew Snover, Nitin Madnani, Bonnie Dorr, and Richard Schwartz, "Fluency, Adequacy, or HTER? Exploring Different Human Judgments with a Tunable MT Metric", Proceedings of the Fourth Workshop on Statistical Machine Translation at the 12th Meeting of the European Chapter of the Association for Computational Linguistics (EACL-2009), Athens, Greece, March, 2009.