nltk.UnigramTagger

class nltk.UnigramTagger(train=None, model=None, backoff=None, cutoff=0, verbose=False)[source]

Unigram Tagger

The UnigramTagger finds the most likely tag for each word in a training corpus, and then uses that information to assign tags to new tokens.

>>> from nltk.corpus import brown
>>> from nltk.tag import UnigramTagger
>>> test_sent = brown.sents(categories='news')[0]
>>> unigram_tagger = UnigramTagger(brown.tagged_sents(categories='news')[:500])
>>> for tok, tag in unigram_tagger.tag(test_sent):
...     print("(%s, %s), " % (tok, tag))
(The, AT), (Fulton, NP-TL), (County, NN-TL), (Grand, JJ-TL),
(Jury, NN-TL), (said, VBD), (Friday, NR), (an, AT),
(investigation, NN), (of, IN), (Atlanta's, NP$), (recent, JJ),
(primary, NN), (election, NN), (produced, VBD), (``, ``),
(no, AT), (evidence, NN), ('', ''), (that, CS), (any, DTI),
(irregularities, NNS), (took, VBD), (place, NN), (., .),
Parameters:
  • train (list(list(tuple(str, str)))) – The corpus of training data, a list of tagged sentences
  • model (dict) – The tagger model
  • backoff (TaggerI) – Another tagger which this tagger will consult when it is unable to tag a word
  • cutoff (int) – The number of instances of training data the tagger must see in order not to use the backoff tagger

Methods

__init__([train, model, backoff, cutoff, ...])
choose_tag(tokens, index, history)
context(tokens, index, history)
decode_json_obj(obj)
encode_json_obj()
evaluate(gold) Score the accuracy of the tagger against the gold standard.
size()
return:The number of entries in the table used by this
tag(tokens)
tag_one(tokens, index, history) Determine an appropriate tag for the specified token, and return that tag.
tag_sents(sentences) Apply self.tag() to each element of sentences.
unicode_repr()

Attributes

backoff The backoff tagger for this tagger.
json_tag