nltk.Text

class nltk.Text(tokens, name=None)[source]

A wrapper around a sequence of simple (string) tokens, which is intended to support initial exploration of texts (via the interactive console). Its methods perform a variety of analyses on the text’s contexts (e.g., counting, concordancing, collocation discovery), and display the results. If you wish to write a program which makes use of these analyses, then you should bypass the Text class, and use the appropriate analysis function or class directly instead.

A Text is typically initialized from a given document or corpus. E.g.:

>>> import nltk.corpus
>>> from nltk.text import Text
>>> moby = Text(nltk.corpus.gutenberg.words('melville-moby_dick.txt'))

Methods

__init__(tokens[, name]) Create a Text object.
collocations([num, window_size]) Print collocations derived from the text, ignoring stopwords.
common_contexts(words[, num]) Find contexts where the specified words appear; list most frequent common contexts first.
concordance(word[, width, lines]) Print a concordance for word with the specified context window.
count(word) Count the number of times this word appears in the text.
dispersion_plot(words) Produce a plot showing the distribution of the words through the text.
findall(regexp) Find instances of the regular expression in the text.
index(word) Find the index of the first occurrence of the word in the text.
plot(*args) See documentation for FreqDist.plot()
readability(method)
similar(word[, num]) Distributional similarity: find other words which appear in the same contexts as the specified word; list most similar words first.
unicode_repr()
vocab()
seealso:nltk.prob.FreqDist