nltk.FreqDist

class nltk.FreqDist(samples=None)[source]

A frequency distribution for the outcomes of an experiment. A frequency distribution records the number of times each outcome of an experiment has occurred. For example, a frequency distribution could be used to record the frequency of each word type in a document. Formally, a frequency distribution can be defined as a function mapping from each sample to the number of times that sample occurred as an outcome.

Frequency distributions are generally constructed by running a number of experiments, and incrementing the count for a sample every time it is an outcome of an experiment. For example, the following code will produce a frequency distribution that encodes how often each word occurs in a text:

>>> from nltk.tokenize import word_tokenize
>>> from nltk.probability import FreqDist
>>> sent = 'This is an example sentence'
>>> fdist = FreqDist()
>>> for word in word_tokenize(sent):
...    fdist[word.lower()] += 1

An equivalent way to do this is with the initializer:

>>> fdist = FreqDist(word.lower() for word in word_tokenize(sent))

Methods

B() Return the total number of sample values (or “bins”) that have counts greater than zero.
N() Return the total number of sample outcomes that have been recorded by this FreqDist.
Nr(r[, bins])
__init__([samples]) Construct a new frequency distribution.
clear(() -> None.  Remove all items from D.)
copy() Create a copy of this frequency distribution.
elements() Iterator over elements repeating each as many times as its count.
freq(sample) Return the frequency of a given sample.
fromkeys(iterable[, v])
get((k[,d]) -> D[k] if k in D, ...)
hapaxes() Return a list of all samples that occur once (hapax legomena)
has_key((k) -> True if D has a key k, else False)
items(() -> list of D’s (key, value) pairs, ...)
iteritems(() -> an iterator over the (key, ...)
iterkeys(() -> an iterator over the keys of D)
itervalues(...)
keys(() -> list of D’s keys)
max() Return the sample with the greatest number of outcomes in this frequency distribution.
most_common([n]) List the n most common elements and their counts from the most common to the least.
pformat([maxlen]) Return a string representation of this FreqDist.
plot(*args, **kwargs) Plot samples from the frequency distribution displaying the most frequent sample first.
pop((k[,d]) -> v, ...) If key is not found, d is returned if given, otherwise KeyError is raised
popitem(() -> (k, v), ...) 2-tuple; but raise KeyError if D is empty.
pprint([maxlen, stream]) Print a string representation of this FreqDist to ‘stream’
r_Nr([bins]) Return the dictionary mapping r to Nr, the number of samples with frequency r, where Nr > 0.
setdefault((k[,d]) -> D.get(k,d), ...)
subtract(*args, **kwds) Like dict.update() but subtracts counts instead of replacing them.
tabulate(*args, **kwargs) Tabulate the given samples from the frequency distribution (cumulative), displaying the most frequent sample first.
unicode_repr() Return a string representation of this FreqDist.
update(*args, **kwds) Like dict.update() but add counts instead of replacing them.
values(() -> list of D’s values)
viewitems(...)
viewkeys(...)
viewvalues(...)