gensim.utils.lemmatize()
¶
-
gensim.utils.
lemmatize
(content, allowed_tags=<_sre.SRE_Pattern object>, light=False, stopwords=frozenset([]), min_length=2, max_length=15)[source]¶ This function is only available when the optional ‘pattern’ package is installed.
Use the English lemmatizer from pattern to extract UTF8-encoded tokens in their base form=lemma, e.g. “are, is, being” -> “be” etc. This is a smarter version of stemming, taking word context into account.
Only considers nouns, verbs, adjectives and adverbs by default (=all other lemmas are discarded).
>>> lemmatize('Hello World! How is it going?! Nonexistentword, 21') ['world/NN', 'be/VB', 'go/VB', 'nonexistentword/NN']
>>> lemmatize('The study ranks high.') ['study/NN', 'rank/VB', 'high/JJ']
>>> lemmatize('The ranks study hard.') ['rank/NN', 'study/VB', 'hard/RB']