nltk.pos_tag()

nltk.pos_tag(tokens, tagset=None)[source]

Use NLTK’s currently recommended part of speech tagger to tag the given list of tokens.

>>> from nltk.tag import pos_tag
>>> from nltk.tokenize import word_tokenize
>>> pos_tag(word_tokenize("John's big idea isn't all that bad."))
[('John', 'NNP'), ("'s", 'POS'), ('big', 'JJ'), ('idea', 'NN'), ('is', 'VBZ'),
("n't", 'RB'), ('all', 'PDT'), ('that', 'DT'), ('bad', 'JJ'), ('.', '.')]
>>> pos_tag(word_tokenize("John's big idea isn't all that bad."), tagset='universal')
[('John', 'NOUN'), ("'s", 'PRT'), ('big', 'ADJ'), ('idea', 'NOUN'), ('is', 'VERB'),
("n't", 'ADV'), ('all', 'DET'), ('that', 'DET'), ('bad', 'ADJ'), ('.', '.')]

NB. Use pos_tag_sents() for efficient tagging of more than one sentence.

Parameters:
  • tokens (list(str)) – Sequence of tokens to be tagged
  • tagset (str) – the tagset to be used, e.g. universal, wsj, brown
Returns:

The tagged tokens

Return type:

list(tuple(str, str))