nltk.WittenBellProbDist

class nltk.WittenBellProbDist(freqdist, bins=None)[source]

The Witten-Bell estimate of a probability distribution. This distribution allocates uniform probability mass to as yet unseen events by using the number of events that have only been seen once. The probability mass reserved for unseen events is equal to T / (N + T) where T is the number of observed event types and N is the total number of observed events. This equates to the maximum likelihood estimate of a new type event occurring. The remaining probability mass is discounted such that all probability estimates sum to one, yielding:

  • p = T / Z (N + T), if count = 0
  • p = c / (N + T), otherwise

Methods

__init__(freqdist[, bins]) Creates a distribution of Witten-Bell probability estimates.
discount()
freqdist()
generate() Return a randomly selected sample from this probability distribution.
logprob(sample) Return the base 2 logarithm of the probability for a given sample.
max()
prob(sample)
samples()
unicode_repr() Return a string representation of this ProbDist.

Attributes

SUM_TO_ONE