nltk.FreqDist
¶
-
class
nltk.
FreqDist
(samples=None)[source]¶ A frequency distribution for the outcomes of an experiment. A frequency distribution records the number of times each outcome of an experiment has occurred. For example, a frequency distribution could be used to record the frequency of each word type in a document. Formally, a frequency distribution can be defined as a function mapping from each sample to the number of times that sample occurred as an outcome.
Frequency distributions are generally constructed by running a number of experiments, and incrementing the count for a sample every time it is an outcome of an experiment. For example, the following code will produce a frequency distribution that encodes how often each word occurs in a text:
>>> from nltk.tokenize import word_tokenize >>> from nltk.probability import FreqDist >>> sent = 'This is an example sentence' >>> fdist = FreqDist() >>> for word in word_tokenize(sent): ... fdist[word.lower()] += 1
An equivalent way to do this is with the initializer:
>>> fdist = FreqDist(word.lower() for word in word_tokenize(sent))
Methods¶
B () |
Return the total number of sample values (or “bins”) that have counts greater than zero. |
N () |
Return the total number of sample outcomes that have been recorded by this FreqDist. |
Nr (r[, bins]) |
|
__init__ ([samples]) |
Construct a new frequency distribution. |
clear (() -> None. Remove all items from D.) |
|
copy () |
Create a copy of this frequency distribution. |
elements () |
Iterator over elements repeating each as many times as its count. |
freq (sample) |
Return the frequency of a given sample. |
fromkeys (iterable[, v]) |
|
get ((k[,d]) -> D[k] if k in D, ...) |
|
hapaxes () |
Return a list of all samples that occur once (hapax legomena) |
has_key ((k) -> True if D has a key k, else False) |
|
items (() -> list of D’s (key, value) pairs, ...) |
|
iteritems (() -> an iterator over the (key, ...) |
|
iterkeys (() -> an iterator over the keys of D) |
|
itervalues (...) |
|
keys (() -> list of D’s keys) |
|
max () |
Return the sample with the greatest number of outcomes in this frequency distribution. |
most_common ([n]) |
List the n most common elements and their counts from the most common to the least. |
pformat ([maxlen]) |
Return a string representation of this FreqDist. |
plot (*args, **kwargs) |
Plot samples from the frequency distribution displaying the most frequent sample first. |
pop ((k[,d]) -> v, ...) |
If key is not found, d is returned if given, otherwise KeyError is raised |
popitem (() -> (k, v), ...) |
2-tuple; but raise KeyError if D is empty. |
pprint ([maxlen, stream]) |
Print a string representation of this FreqDist to ‘stream’ |
r_Nr ([bins]) |
Return the dictionary mapping r to Nr, the number of samples with frequency r, where Nr > 0. |
setdefault ((k[,d]) -> D.get(k,d), ...) |
|
subtract (*args, **kwds) |
Like dict.update() but subtracts counts instead of replacing them. |
tabulate (*args, **kwargs) |
Tabulate the given samples from the frequency distribution (cumulative), displaying the most frequent sample first. |
unicode_repr () |
Return a string representation of this FreqDist. |
update (*args, **kwds) |
Like dict.update() but add counts instead of replacing them. |
values (() -> list of D’s values) |
|
viewitems (...) |
|
viewkeys (...) |
|
viewvalues (...) |