nltk.TextCollection
¶
-
class
nltk.
TextCollection
(source)[source]¶ A collection of texts, which can be loaded with list of texts, or with a corpus consisting of one or more texts, and which supports counting, concordancing, collocation discovery, etc. Initialize a TextCollection as follows:
>>> import nltk.corpus >>> from nltk.text import TextCollection >>> print('hack'); from nltk.book import text1, text2, text3 hack... >>> gutenberg = TextCollection(nltk.corpus.gutenberg) >>> mytexts = TextCollection([text1, text2, text3])
Iterating over a TextCollection produces all the tokens of all the texts in order.
Methods¶
__init__ (source) |
|||
collocations ([num, window_size]) |
Print collocations derived from the text, ignoring stopwords. | ||
common_contexts (words[, num]) |
Find contexts where the specified words appear; list most frequent common contexts first. | ||
concordance (word[, width, lines]) |
Print a concordance for word with the specified context window. |
||
count (word) |
Count the number of times this word appears in the text. | ||
dispersion_plot (words) |
Produce a plot showing the distribution of the words through the text. | ||
findall (regexp) |
Find instances of the regular expression in the text. | ||
idf (term) |
The number of texts in the corpus divided by the number of texts that the term appears in. | ||
index (word) |
Find the index of the first occurrence of the word in the text. | ||
plot (*args) |
See documentation for FreqDist.plot() | ||
readability (method) |
|||
similar (word[, num]) |
Distributional similarity: find other words which appear in the same contexts as the specified word; list most similar words first. | ||
tf (term, text) |
The frequency of the term in text. | ||
tf_idf (term, text) |
|||
unicode_repr () |
|||
vocab () |
|