łň
4ŇÇIc           @   s  d  Z  d d k l Z l Z d d k Td d k Td d k Td d k Td d k Td d k	 Td d d d d d	 d
 g Z
 y0 d d k Z d d k Te
 d d d d g 7Z
 Wn e j
 o n Xd d k l Z d e e f d     YZ e d  d    Z e d  d    Z d S(   s.  
Classes and interfaces for labeling tokens with category labels (or
X{class labels}).  Typically, labels are represented with strings
(such as C{'health'} or C{'sports'}).  Classifiers can be used to
perform a wide range of classification tasks.  For example,
classifiers can be used...

  - to classify documents by topic.
  - to classify ambiguous words by which word sense is intended.
  - to classify acoustic signals by which phoneme they represent.
  - to classify sentences by their author.

Features
========
In order to decide which category label is appropriate for a given
token, classifiers examine one or more 'features' of the token.  These
X{features} are typically chosen by hand, and indicate which aspects
of the token are relevant to the classification decision.  For
example, a document classifier might use a separate feature for each
word, recording how often that word occured in the document.

Featuresets
===========
The features describing a token are encoded using a X{featureset},
which is a dictionary that maps from X{feature names} to X{feature
values}.  Feature names are unique strings that indicate what aspect
of the token is encoded by the feature.  Examples include
C{'prevword'}, for a feature whose value is the previous word; and
C{'contains-word(library)'} for a feature that is true when a document
contains the word C{'library'}.  Feature values are typically
booleans, numbers, or strings, depending on which feature they
describe.

Featuresets are typically constructed using a X{feature detector}
(also known as a X{feature extractor}).  A feature detector is a
function that takes a token (and sometimes information about its
context) as its input, and returns a featureset describing that token.
For example, the following feature detector converts a document
(stored as a list of words) to a featureset describing the set of
words included in the document:

    >>> # Define a feature detector function.
    >>> def document_features(document):
    ...     return dict([('contains-word(%s)' % w, True) for w in document])

Feature detectors are typically applied to each token before it is fed
to the classifier:

    >>> Classify each Gutenberg document.
    >>> for file in gutenberg.files():
    ...     doc = gutenberg.tokenized(file)
    ...     print doc_name, classifier.classify(document_features(doc))

The parameters that a feature detector expects will vary, depending on
the task and the needs of the feature detector.  For example, a
feature detector for word sense disambiguation (WSD) might take as its
input a sentence, and the index of a word that should be classified,
and return a featureset for that word.  The following feature detector
for WSD includes features describing the left and right contexts of
the target word:

    >>> def wsd_features(sentence, index):
    ...     featureset = {}
    ...     for i in range(max(0, index-3), index):
    ...         featureset['left-context(%s)' % sentence[i]] = True
    ...     for i in range(index, max(index+3, len(sentence))
    ...         featureset['right-context(%s)' % sentence[i]] = True
    ...     return featureset

Training Classifiers
====================
Most classifiers are built by training them on a list of hand-labeled
examples, known as the X{training set}.  Training sets are represented
as lists of C{(featuredict, label)} tuples.
i˙˙˙˙(   t
   deprecatedt
   Deprecated(   t   *t   ClassifierIt   MultiClassifierIt   NaiveBayesClassifiert   DecisionTreeClassifiert   WekaClassifiert   config_wekat   config_megamNt   MaxentClassifiert   BinaryMaxentFeatureEncodingt    ConditionalExponentialClassifiert   train_maxent_classifier(   R   t	   ClassifyIc           B   s   e  Z d  Z RS(   s   Use nltk.ClassifierI instead.(   t   __name__t
   __module__t   __doc__(    (    (    s,   /p/zhu/06/nlp/nltk/nltk/classify/__init__.pyR   y   s   s%   Use nltk.classify.accuracy() instead.c         C   s   t  |  |  S(   N(   t   accuracy(   t
   classifiert   gold(    (    s,   /p/zhu/06/nlp/nltk/nltk/classify/__init__.pyt   classifier_accuracy|   s    s+   Use nltk.classify.log_likelihood() instead.c         C   s   t  |  |  S(   N(   t   log_likelihood(   R   R   (    (    s,   /p/zhu/06/nlp/nltk/nltk/classify/__init__.pyt   classifier_log_likelihood   s    (   R   t   nltk.internalsR    R   t   apit   utilt
   naivebayest   decisiontreet   wekat   megamt   __all__t   numpyt   maxentt   ImportErrorR   R   R   R   (    (    (    s,   /p/zhu/06/nlp/nltk/nltk/classify/__init__.pys   <module>R   s*   





	
	