nltk.TextCollection

class nltk.TextCollection(source)[source]

A collection of texts, which can be loaded with list of texts, or with a corpus consisting of one or more texts, and which supports counting, concordancing, collocation discovery, etc. Initialize a TextCollection as follows:

>>> import nltk.corpus
>>> from nltk.text import TextCollection
>>> print('hack'); from nltk.book import text1, text2, text3
hack...
>>> gutenberg = TextCollection(nltk.corpus.gutenberg)
>>> mytexts = TextCollection([text1, text2, text3])

Iterating over a TextCollection produces all the tokens of all the texts in order.

Methods

__init__(source)
collocations([num, window_size]) Print collocations derived from the text, ignoring stopwords.
common_contexts(words[, num]) Find contexts where the specified words appear; list most frequent common contexts first.
concordance(word[, width, lines]) Print a concordance for word with the specified context window.
count(word) Count the number of times this word appears in the text.
dispersion_plot(words) Produce a plot showing the distribution of the words through the text.
findall(regexp) Find instances of the regular expression in the text.
idf(term) The number of texts in the corpus divided by the number of texts that the term appears in.
index(word) Find the index of the first occurrence of the word in the text.
plot(*args) See documentation for FreqDist.plot()
readability(method)
similar(word[, num]) Distributional similarity: find other words which appear in the same contexts as the specified word; list most similar words first.
tf(term, text) The frequency of the term in text.
tf_idf(term, text)
unicode_repr()
vocab()
seealso:nltk.prob.FreqDist