³ò
’B_Kc           @   sº   d  Z  d d k Td d k Td d k Td d k Td d k Td d k Td d k Td d d d d d d	 d
 d d g
 Z y0 d d k	 Z	 d d k
 Te d d d d g 7Z Wn e j
 o n Xd S(   s0  
Classes and interfaces for labeling tokens with category labels (or
X{class labels}).  Typically, labels are represented with strings
(such as C{'health'} or C{'sports'}).  Classifiers can be used to
perform a wide range of classification tasks.  For example,
classifiers can be used...

  - to classify documents by topic.
  - to classify ambiguous words by which word sense is intended.
  - to classify acoustic signals by which phoneme they represent.
  - to classify sentences by their author.

Features
========
In order to decide which category label is appropriate for a given
token, classifiers examine one or more 'features' of the token.  These
X{features} are typically chosen by hand, and indicate which aspects
of the token are relevant to the classification decision.  For
example, a document classifier might use a separate feature for each
word, recording how often that word occured in the document.

Featuresets
===========
The features describing a token are encoded using a X{featureset},
which is a dictionary that maps from X{feature names} to X{feature
values}.  Feature names are unique strings that indicate what aspect
of the token is encoded by the feature.  Examples include
C{'prevword'}, for a feature whose value is the previous word; and
C{'contains-word(library)'} for a feature that is true when a document
contains the word C{'library'}.  Feature values are typically
booleans, numbers, or strings, depending on which feature they
describe.

Featuresets are typically constructed using a X{feature detector}
(also known as a X{feature extractor}).  A feature detector is a
function that takes a token (and sometimes information about its
context) as its input, and returns a featureset describing that token.
For example, the following feature detector converts a document
(stored as a list of words) to a featureset describing the set of
words included in the document:

    >>> # Define a feature detector function.
    >>> def document_features(document):
    ...     return dict([('contains-word(%s)' % w, True) for w in document])

Feature detectors are typically applied to each token before it is fed
to the classifier:

    >>> Classify each Gutenberg document.
    >>> for file in gutenberg.files():
    ...     doc = gutenberg.tokenized(file)
    ...     print doc_name, classifier.classify(document_features(doc))

The parameters that a feature detector expects will vary, depending on
the task and the needs of the feature detector.  For example, a
feature detector for word sense disambiguation (WSD) might take as its
input a sentence, and the index of a word that should be classified,
and return a featureset for that word.  The following feature detector
for WSD includes features describing the left and right contexts of
the target word:

    >>> def wsd_features(sentence, index):
    ...     featureset = {}
    ...     for i in range(max(0, index-3), index):
    ...         featureset['left-context(%s)' % sentence[i]] = True
    ...     for i in range(index, max(index+3, len(sentence))):
    ...         featureset['right-context(%s)' % sentence[i]] = True
    ...     return featureset

Training Classifiers
====================
Most classifiers are built by training them on a list of hand-labeled
examples, known as the X{training set}.  Training sets are represented
as lists of C{(featuredict, label)} tuples.
iÿÿÿÿ(   t   *t   ClassifierIt   MultiClassifierIt   NaiveBayesClassifiert   DecisionTreeClassifiert   WekaClassifiert   config_wekat   config_megamt   rte_classifiert   rte_featurest   RTEFeatureExtractorNt   MaxentClassifiert   BinaryMaxentFeatureEncodingt    ConditionalExponentialClassifiert   train_maxent_classifier(   t   __doc__t   wekat   megamt   apit   utilt
   naivebayest   decisiontreet   rte_classifyt   __all__t   numpyt   maxentt   ImportError(    (    (    s,   /p/zhu/06/nlp/nltk/nltk/classify/__init__.pys   <module>R   s$   






	
	