`nltk.tag`¶

NLTK Taggers

This package contains classes and interfaces for part-of-speech tagging, or simply “tagging”.

A “tag” is a case-sensitive string that specifies some property of a token, such as its part of speech. Tagged tokens are encoded as tuples (tag, token). For example, the following tagged token combines the word 'fly' with a noun part of speech tag ('NN'):

>>>>>> tagged_tok = ('fly', 'NN')

An off-the-shelf tagger is available. It uses the Penn Treebank tagset:

>>>>>> from nltk import pos_tag, word_tokenize
>>> pos_tag(word_tokenize("John's big idea isn't all that bad."))
[('John', 'NNP'), ("'s", 'POS'), ('big', 'JJ'), ('idea', 'NN'), ('is', 'VBZ'),
("n't", 'RB'), ('all', 'PDT'), ('that', 'DT'), ('bad', 'JJ'), ('.', '.')]

This package defines several taggers, which take a list of tokens, assign a tag to each one, and return the resulting list of tagged tokens. Most of the taggers are built automatically based on a training corpus. For example, the unigram tagger tags each word w by checking what the most frequent tag for w was in a training corpus:

>>>>>> from nltk.corpus import brown
>>> from nltk.tag import UnigramTagger
>>> tagger = UnigramTagger(brown.tagged_sents(categories='news')[:500])
>>> sent = ['Mitchell', 'decried', 'the', 'high', 'rate', 'of', 'unemployment']
>>> for word, tag in tagger.tag(sent):
...     print(word, '->', tag)
Mitchell -> NP
decried -> None
the -> AT
high -> JJ
rate -> NN
of -> IN
unemployment -> None

Note that words that the tagger has not seen during training receive a tag of None.

We evaluate a tagger on data that was not seen during training:

>>>>>> tagger.evaluate(brown.tagged_sents(categories='news')[500:600])
0.73...

For more information, please consult chapter 5 of the NLTK Book.

Functions¶

`load`(resource_url[, format, cache, verbose, ...])	Load a given resource from the NLTK data package.
`map_tag`(source, target, source_tag)	Maps the tag from the source tagset to the target tagset.
`pos_tag`(tokens[, tagset])	Use NLTK’s currently recommended part of speech tagger to tag the given list of tokens.
`pos_tag_sents`(sentences[, tagset])	Use NLTK’s currently recommended part of speech tagger to tag the given list of sentences, each consisting of a list of tokens.
`str2tuple`(s[, sep])	Given the string representation of a tagged token, return the corresponding tuple representation.
`tagset_mapping`(source, target)	Retrieve the mapping dictionary between tagsets.
`tuple2str`(tagged_token[, sep])	Given the tuple representation of a tagged token, return the corresponding string representation.
`untag`(tagged_sentence)	Given a tagged sentence, return an untagged version of that sentence.

Classes¶

`AffixTagger`([train, model, affix_length, ...])	A tagger that chooses a token’s tag based on a leading or trailing substring of its word string.
`BigramTagger`([train, model, backoff, ...])	A tagger that chooses a token’s tag based its word string and on the preceding words’ tag.
`BrillTagger`(initial_tagger, rules[, ...])	Brill’s transformational rule-based tagger.
`BrillTaggerTrainer`(initial_tagger, templates)	A trainer for tbl taggers.
`CRFTagger`([feature_func, verbose, training_opt])	A module for POS tagging using CRFSuite https://pypi.python.org/pypi/python-crfsuite
`ClassifierBasedPOSTagger`([feature_detector, ...])	A classifier based part of speech tagger.
`ClassifierBasedTagger`([feature_detector, ...])	A sequential tagger that uses a classifier to choose the tag for each token in a sentence.
`ContextTagger`(context_to_tag[, backoff])	An abstract base class for sequential backoff taggers that choose a tag for a token based on the value of its “context”.
`DefaultTagger`(tag)	A tagger that assigns the same tag to every token.
`HiddenMarkovModelTagger`(symbols, states, ...)	Hidden Markov model class, a generative model for labelling sequence data.
`HiddenMarkovModelTrainer`([states, symbols])	Algorithms for learning HMM parameters from training data.
`HunposTagger`(path_to_model[, path_to_bin, ...])	A class for pos tagging with HunPos.
`NgramTagger`(n[, train, model, backoff, ...])	A tagger that chooses a token’s tag based on its word string and on the preceding n word’s tags.
`PerceptronTagger`([load])	Greedy Averaged Perceptron tagger, as implemented by Matthew Honnibal.
`RegexpTagger`(regexps[, backoff])	Regular Expression Tagger
`SennaChunkTagger`(path[, encoding])
`SennaNERTagger`(path[, encoding])
`SennaTagger`(path[, encoding])
`SequentialBackoffTagger`([backoff])	An abstract base class for taggers that tags words sequentially, left to right.
`StanfordNERTagger`(args, *kwargs)	A class for Named-Entity Tagging with Stanford Tagger.
`StanfordPOSTagger`(args, *kwargs)	A class for pos tagging with Stanford Tagger.
`StanfordTagger`(model_filename[, ...])	An interface to Stanford taggers.
`TaggerI`	A processing interface for assigning a tag to each token in a list.
`TnT`([unk, Trained, N, C])	TnT - Statistical POS tagger
`TrigramTagger`([train, model, backoff, ...])	A tagger that chooses a token’s tag based its word string and on the preceding two words’ tags.
`UnigramTagger`([train, model, backoff, ...])	Unigram Tagger

nltk.tag¶

Functions¶

Classes¶

`nltk.tag`¶