nltk.UnigramTagger
¶
-
class
nltk.
UnigramTagger
(train=None, model=None, backoff=None, cutoff=0, verbose=False)[source]¶ Unigram Tagger
The UnigramTagger finds the most likely tag for each word in a training corpus, and then uses that information to assign tags to new tokens.
>>> from nltk.corpus import brown >>> from nltk.tag import UnigramTagger >>> test_sent = brown.sents(categories='news')[0] >>> unigram_tagger = UnigramTagger(brown.tagged_sents(categories='news')[:500]) >>> for tok, tag in unigram_tagger.tag(test_sent): ... print("(%s, %s), " % (tok, tag)) (The, AT), (Fulton, NP-TL), (County, NN-TL), (Grand, JJ-TL), (Jury, NN-TL), (said, VBD), (Friday, NR), (an, AT), (investigation, NN), (of, IN), (Atlanta's, NP$), (recent, JJ), (primary, NN), (election, NN), (produced, VBD), (``, ``), (no, AT), (evidence, NN), ('', ''), (that, CS), (any, DTI), (irregularities, NNS), (took, VBD), (place, NN), (., .),
Parameters: - train (list(list(tuple(str, str)))) – The corpus of training data, a list of tagged sentences
- model (dict) – The tagger model
- backoff (TaggerI) – Another tagger which this tagger will consult when it is unable to tag a word
- cutoff (int) – The number of instances of training data the tagger must see in order not to use the backoff tagger
Methods¶
__init__ ([train, model, backoff, cutoff, ...]) |
|||
choose_tag (tokens, index, history) |
|||
context (tokens, index, history) |
|||
decode_json_obj (obj) |
|||
encode_json_obj () |
|||
evaluate (gold) |
Score the accuracy of the tagger against the gold standard. | ||
size () |
|
||
tag (tokens) |
|||
tag_one (tokens, index, history) |
Determine an appropriate tag for the specified token, and return that tag. | ||
tag_sents (sentences) |
Apply self.tag() to each element of sentences. |
||
unicode_repr () |