nltk.ConditionalFreqDist

class nltk.ConditionalFreqDist(cond_samples=None)[source]

A collection of frequency distributions for a single experiment run under different conditions. Conditional frequency distributions are used to record the number of times each sample occurred, given the condition under which the experiment was run. For example, a conditional frequency distribution could be used to record the frequency of each word (type) in a document, given its length. Formally, a conditional frequency distribution can be defined as a function that maps from each condition to the FreqDist for the experiment under that condition.

Conditional frequency distributions are typically constructed by repeatedly running an experiment under a variety of conditions, and incrementing the sample outcome counts for the appropriate conditions. For example, the following code will produce a conditional frequency distribution that encodes how often each word type occurs, given the length of that word type:

>>> from nltk.probability import ConditionalFreqDist
>>> from nltk.tokenize import word_tokenize
>>> sent = "the the the dog dog some other words that we do not care about"
>>> cfdist = ConditionalFreqDist()
>>> for word in word_tokenize(sent):
...     condition = len(word)
...     cfdist[condition][word] += 1

An equivalent way to do this is with the initializer:

>>> cfdist = ConditionalFreqDist((len(word), word) for word in word_tokenize(sent))

The frequency distribution for each condition is accessed using the indexing operator:

>>> cfdist[3]
FreqDist({'the': 3, 'dog': 2, 'not': 1})
>>> cfdist[3].freq('the')
0.5
>>> cfdist[3]['dog']
2

When the indexing operator is used to access the frequency distribution for a condition that has not been accessed before, ConditionalFreqDist creates a new empty FreqDist for that condition.

Methods

N() Return the total number of sample outcomes that have been recorded by this ConditionalFreqDist.
__init__([cond_samples]) Construct a new empty conditional frequency distribution.
clear(() -> None.  Remove all items from D.)
conditions() Return a list of the conditions that have been accessed for this ConditionalFreqDist.
copy(() -> a shallow copy of D.)
fromkeys(...) v defaults to None.
get((k[,d]) -> D[k] if k in D, ...)
has_key((k) -> True if D has a key k, else False)
items(() -> list of D’s (key, value) pairs, ...)
iteritems(() -> an iterator over the (key, ...)
iterkeys(() -> an iterator over the keys of D)
itervalues(...)
keys(() -> list of D’s keys)
plot(*args, **kwargs) Plot the given samples from the conditional frequency distribution.
pop((k[,d]) -> v, ...) If key is not found, d is returned if given, otherwise KeyError is raised
popitem(() -> (k, v), ...) 2-tuple; but raise KeyError if D is empty.
setdefault((k[,d]) -> D.get(k,d), ...)
tabulate(*args, **kwargs) Tabulate the given samples from the conditional frequency distribution.
unicode_repr() Return a string representation of this ConditionalFreqDist.
update(([E, ...) If E present and has a .keys() method, does: for k in E: D[k] = E[k]
values(() -> list of D’s values)
viewitems(...)
viewkeys(...)
viewvalues(...)

Attributes

default_factory Factory for default value called by __missing__().