|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.stanford.nlp.trees.AbstractTreebankLanguagePack
public abstract class AbstractTreebankLanguagePack
This provides an implementation of parts of the TreebankLanguagePack API to reduce the load on fresh implementations. Only the abstract methods below need to be implemented to give a reasonable solution for a new language.
Field Summary | |
---|---|
static String |
DEFAULT_ENCODING
Use this as the default encoding for Readers and Writers of Treebank data. |
protected static char |
DEFAULT_GF_CHAR
|
protected char |
gfCharacter
Default character for indicating that something is a grammatical fn; probably should be overridden by lang specific ones |
Constructor Summary | |
---|---|
AbstractTreebankLanguagePack()
Gives a handle to the TreebankLanguagePack. |
|
AbstractTreebankLanguagePack(char gfChar)
Gives a handle to the TreebankLanguagePack. |
Method Summary | |
---|---|
String |
basicCategory(String category)
Returns the basic syntactic category of a String. |
String |
categoryAndFunction(String category)
Returns the syntactic category and 'function' of a String. |
Filter<String> |
evalBIgnoredPunctuationTagAcceptFilter()
Returns a filter that accepts a String that is a punctuation tag that should be ignored by EVALB-style evaluation, and rejects everything else. |
Filter<String> |
evalBIgnoredPunctuationTagRejectFilter()
Returns a filter that accepts everything except a String that is a punctuation tag that should be ignored by EVALB-style evaluation. |
String[] |
evalBIgnoredPunctuationTags()
Returns a String array of punctuation tags that EVALB-style evaluation should ignore for this treebank/language. |
Function<String,String> |
getBasicCategoryFunction()
Returns a Function object that maps Strings to Strings according
to this TreebankLanguagePack's basicCategory() method. |
Function<String,String> |
getCategoryAndFunctionFunction()
Returns a Function object that maps Strings to Strings according
to this TreebankLanguagePack's categoryAndFunction() method. |
String |
getEncoding()
Return the input Charset encoding for the Treebank. |
char |
getGfCharacter()
|
TokenizerFactory<? extends HasWord> |
getTokenizerFactory()
Return a tokenizer which might be suitable for tokenizing text that will be used with this Treebank/Language pair, without tokenizing carriage returns (i.e., treating them as white space). |
GrammaticalStructureFactory |
grammaticalStructureFactory()
Return a GrammaticalStructureFactory suitable for this language/treebank. |
GrammaticalStructureFactory |
grammaticalStructureFactory(Filter<String> puncFilt)
Return a GrammaticalStructureFactory suitable for this language/treebank. |
boolean |
isEvalBIgnoredPunctuationTag(String str)
Accepts a String that is a punctuation tag that should be ignored by EVALB-style evaluation, and rejects everything else. |
boolean |
isLabelAnnotationIntroducingCharacter(char ch)
Say whether this character is an annotation introducing character. |
boolean |
isPunctuationTag(String str)
Accepts a String that is a punctuation tag name, and rejects everything else. |
boolean |
isPunctuationWord(String str)
Accepts a String that is a punctuation word, and rejects everything else. |
boolean |
isSentenceFinalPunctuationTag(String str)
Accepts a String that is a sentence end punctuation tag, and rejects everything else. |
boolean |
isStartSymbol(String str)
Accepts a String that is a start symbol of the treebank. |
char[] |
labelAnnotationIntroducingCharacters()
Return an array of characters at which a String should be truncated to give the basic syntactic category of a label. |
Filter<String> |
punctuationTagAcceptFilter()
Return a filter that accepts a String that is a punctuation tag name, and rejects everything else. |
Filter<String> |
punctuationTagRejectFilter()
Return a filter that rejects a String that is a punctuation tag name, and rejects everything else. |
abstract String[] |
punctuationTags()
Returns a String array of punctuation tags for this treebank/language. |
Filter<String> |
punctuationWordAcceptFilter()
Returns a filter that accepts a String that is a punctuation word, and rejects everything else. |
Filter<String> |
punctuationWordRejectFilter()
Returns a filter that accepts a String that is not a punctuation word, and rejects punctuation. |
abstract String[] |
punctuationWords()
Returns a String array of punctuation words for this treebank/language. |
Filter<String> |
sentenceFinalPunctuationTagAcceptFilter()
Returns a filter that accepts a String that is a sentence end punctuation tag, and rejects everything else. |
abstract String[] |
sentenceFinalPunctuationTags()
Returns a String array of sentence final punctuation tags for this treebank/language. |
void |
setGfCharacter(char gfCharacter)
Sets the grammatical function indicating character to gfCharacter. |
String |
startSymbol()
Returns a String which is the first (perhaps unique) start symbol of the treebank, or null if none is defined. |
Filter<String> |
startSymbolAcceptFilter()
Return a filter that accepts a String that is a start symbol of the treebank, and rejects everything else. |
abstract String[] |
startSymbols()
Returns a String array of treebank start symbols. |
String |
stripGF(String category)
Returns the category for a String with everything following the gf character (which may be language specific) stripped. |
TreeReaderFactory |
treeReaderFactory()
Returns a TreeReaderFactory suitable for general purpose use with this language/treebank. |
TokenizerFactory<Tree> |
treeTokenizerFactory()
Return a TokenizerFactory for Trees of this language/treebank. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface edu.stanford.nlp.trees.TreebankLanguagePack |
---|
headFinder, sentenceFinalPunctuationWords, treebankFileExtension |
Field Detail |
---|
protected char gfCharacter
protected static final char DEFAULT_GF_CHAR
public static final String DEFAULT_ENCODING
Constructor Detail |
---|
public AbstractTreebankLanguagePack()
public AbstractTreebankLanguagePack(char gfChar)
gfChar
- The character that sets of grammatical functions in node labels.Method Detail |
---|
public abstract String[] punctuationTags()
punctuationTags
in interface TreebankLanguagePack
public abstract String[] punctuationWords()
punctuationWords
in interface TreebankLanguagePack
public abstract String[] sentenceFinalPunctuationTags()
sentenceFinalPunctuationTags
in interface TreebankLanguagePack
public String[] evalBIgnoredPunctuationTags()
evalBIgnoredPunctuationTags
in interface TreebankLanguagePack
public boolean isPunctuationTag(String str)
isPunctuationTag
in interface TreebankLanguagePack
str
- The string to check
public boolean isPunctuationWord(String str)
isPunctuationWord
in interface TreebankLanguagePack
str
- The string to check
public boolean isSentenceFinalPunctuationTag(String str)
isSentenceFinalPunctuationTag
in interface TreebankLanguagePack
str
- The string to check
public boolean isEvalBIgnoredPunctuationTag(String str)
isEvalBIgnoredPunctuationTag
in interface TreebankLanguagePack
str
- The string to check
public Filter<String> punctuationTagAcceptFilter()
punctuationTagAcceptFilter
in interface TreebankLanguagePack
public Filter<String> punctuationTagRejectFilter()
punctuationTagRejectFilter
in interface TreebankLanguagePack
public Filter<String> punctuationWordAcceptFilter()
punctuationWordAcceptFilter
in interface TreebankLanguagePack
public Filter<String> punctuationWordRejectFilter()
punctuationWordRejectFilter
in interface TreebankLanguagePack
public Filter<String> sentenceFinalPunctuationTagAcceptFilter()
sentenceFinalPunctuationTagAcceptFilter
in interface TreebankLanguagePack
public Filter<String> evalBIgnoredPunctuationTagAcceptFilter()
evalBIgnoredPunctuationTagAcceptFilter
in interface TreebankLanguagePack
public Filter<String> evalBIgnoredPunctuationTagRejectFilter()
evalBIgnoredPunctuationTagRejectFilter
in interface TreebankLanguagePack
public String getEncoding()
Charset
class.
getEncoding
in interface TreebankLanguagePack
public char[] labelAnnotationIntroducingCharacters()
labelAnnotationIntroducingCharacters
in interface TreebankLanguagePack
public String basicCategory(String category)
labelAnnotationIntroducingCharacters()
.
However, there is also special case stuff to deal with
labelAnnotationIntroducingCharacters in category labels:
(i) if the first char is in this set, it's never truncated
(e.g., '-' or '=' as a token), and (ii) if it starts with
one of this set, a second instance of the same item from this set is
also excluded (to deal with '-LLB-', '-RCB-', etc.).
basicCategory
in interface TreebankLanguagePack
category
- The whole String name of the label
public String stripGF(String category)
TreebankLanguagePack
stripGF
in interface TreebankLanguagePack
category
- The String name of the label (may previously have had basic category called on it)
public Function<String,String> getBasicCategoryFunction()
Function
object that maps Strings to Strings according
to this TreebankLanguagePack's basicCategory() method.
getBasicCategoryFunction
in interface TreebankLanguagePack
public String categoryAndFunction(String category)
category-function
.
This implementation strips numeric tags after label introducing
characters (assuming that non-numeric things are functional tags).
categoryAndFunction
in interface TreebankLanguagePack
category
- The whole String name of the label
public Function<String,String> getCategoryAndFunctionFunction()
Function
object that maps Strings to Strings according
to this TreebankLanguagePack's categoryAndFunction() method.
getCategoryAndFunctionFunction
in interface TreebankLanguagePack
public boolean isLabelAnnotationIntroducingCharacter(char ch)
isLabelAnnotationIntroducingCharacter
in interface TreebankLanguagePack
ch
- The character to check
public boolean isStartSymbol(String str)
isStartSymbol
in interface TreebankLanguagePack
str
- The str to test
public Filter<String> startSymbolAcceptFilter()
startSymbolAcceptFilter
in interface TreebankLanguagePack
public abstract String[] startSymbols()
startSymbols
in interface TreebankLanguagePack
public String startSymbol()
startSymbol
in interface TreebankLanguagePack
public TokenizerFactory<? extends HasWord> getTokenizerFactory()
WhitespaceTokenizer
.
getTokenizerFactory
in interface TreebankLanguagePack
public GrammaticalStructureFactory grammaticalStructureFactory()
grammaticalStructureFactory
in interface TreebankLanguagePack
public GrammaticalStructureFactory grammaticalStructureFactory(Filter<String> puncFilt)
grammaticalStructureFactory
in interface TreebankLanguagePack
puncFilt
- A filter which should reject punctuation words (as Strings)
public char getGfCharacter()
public void setGfCharacter(char gfCharacter)
TreebankLanguagePack
setGfCharacter
in interface TreebankLanguagePack
gfCharacter
- Sets the character in label names that sets of
grammatical function marking (from the phrase label).public TreeReaderFactory treeReaderFactory()
treeReaderFactory
in interface TreebankLanguagePack
public TokenizerFactory<Tree> treeTokenizerFactory()
treeTokenizerFactory
in interface TreebankLanguagePack
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |