`nltk.UnigramTagger`¶

class nltk.UnigramTagger(train=None, model=None, backoff=None, cutoff=0, verbose=False)[source]¶

Unigram Tagger

The UnigramTagger finds the most likely tag for each word in a training corpus, and then uses that information to assign tags to new tokens.

>>> from nltk.corpus import brown
>>> from nltk.tag import UnigramTagger
>>> test_sent = brown.sents(categories='news')[0]
>>> unigram_tagger = UnigramTagger(brown.tagged_sents(categories='news')[:500])
>>> for tok, tag in unigram_tagger.tag(test_sent):
...     print("(%s, %s), " % (tok, tag))
(The, AT), (Fulton, NP-TL), (County, NN-TL), (Grand, JJ-TL),
(Jury, NN-TL), (said, VBD), (Friday, NR), (an, AT),
(investigation, NN), (of, IN), (Atlanta's, NP$), (recent, JJ),
(primary, NN), (election, NN), (produced, VBD), (``, ``),
(no, AT), (evidence, NN), ('', ''), (that, CS), (any, DTI),
(irregularities, NNS), (took, VBD), (place, NN), (., .),

Parameters:	train (list(list(tuple(str, str)))) – The corpus of training data, a list of tagged sentences model (dict) – The tagger model backoff (TaggerI) – Another tagger which this tagger will consult when it is unable to tag a word cutoff (int) – The number of instances of training data the tagger must see in order not to use the backoff tagger

Methods¶

__init__([train, model, backoff, cutoff, ...])

choose_tag(tokens, index, history)

context(tokens, index, history)

decode_json_obj(obj)

encode_json_obj()

evaluate(gold) Score the accuracy of the tagger against the gold standard.

size()

return:	The number of entries in the table used by this

tag(tokens)

tag_one(tokens, index, history) Determine an appropriate tag for the specified token, and return that tag.

tag_sents(sentences) Apply self.tag() to each element of sentences.

unicode_repr()

Attributes¶

`backoff`	The backoff tagger for this tagger.
`json_tag`

nltk.UnigramTagger¶

Methods¶

Attributes¶

`nltk.UnigramTagger`¶