nltk.tag.ClassifierBasedTagger

class nltk.tag.ClassifierBasedTagger(feature_detector=None, train=None, classifier_builder=<bound method type.train of <class 'nltk.classify.naivebayes.NaiveBayesClassifier'>>, classifier=None, backoff=None, cutoff_prob=None, verbose=False)[source]

A sequential tagger that uses a classifier to choose the tag for each token in a sentence. The featureset input for the classifier is generated by a feature detector function:

feature_detector(tokens, index, history) -> featureset

Where tokens is the list of unlabeled tokens in the sentence; index is the index of the token for which feature detection should be performed; and history is list of the tags for all tokens before index.

Construct a new classifier-based sequential tagger.

Parameters:
  • feature_detector – A function used to generate the featureset input for the classifier:: feature_detector(tokens, index, history) -> featureset
  • train – A tagged corpus consisting of a list of tagged sentences, where each sentence is a list of (word, tag) tuples.
  • backoff – A backoff tagger, to be used by the new tagger if it encounters an unknown context.
  • classifier_builder – A function used to train a new classifier based on the data in train. It should take one argument, a list of labeled featuresets (i.e., (featureset, label) tuples).
  • classifier – The classifier that should be used by the tagger. This is only useful if you want to manually construct the classifier; normally, you would use train instead.
  • backoff – A backoff tagger, used if this tagger is unable to determine a tag for a given token.
  • cutoff_prob – If specified, then this tagger will fall back on its backoff tagger if the probability of the most likely tag is less than cutoff_prob.

Methods

__init__([feature_detector, train, ...])
choose_tag(tokens, index, history)
classifier() Return the classifier that this tagger uses to choose a tag for each word in a sentence.
evaluate(gold) Score the accuracy of the tagger against the gold standard.
feature_detector(tokens, index, history) Return the feature detector that this tagger uses to generate featuresets for its classifier.
tag(tokens)
tag_one(tokens, index, history) Determine an appropriate tag for the specified token, and return that tag.
tag_sents(sentences) Apply self.tag() to each element of sentences.
unicode_repr()

Attributes

backoff The backoff tagger for this tagger.