nltk.NgramAssocMeasures

class nltk.NgramAssocMeasures[source]

An abstract class defining a collection of generic association measures. Each public method returns a score, taking the following arguments:

score_fn(count_of_ngram,
         (count_of_n-1gram_1, ..., count_of_n-1gram_j),
         (count_of_n-2gram_1, ..., count_of_n-2gram_k),
         ...,
         (count_of_1gram_1, ..., count_of_1gram_n),
         count_of_total_words)

See BigramAssocMeasures and TrigramAssocMeasures

Inheriting classes should define a property _n, and a method _contingency which calculates contingency values from marginals in order for all association measures defined here to be usable.

Methods

chi_sq(*marginals) Scores ngrams using Pearson’s chi-square as in Manning and Schutze 5.3.3.
jaccard(*marginals) Scores ngrams using the Jaccard index.
likelihood_ratio(*marginals) Scores ngrams using likelihood ratios as in Manning and Schutze 5.3.4.
mi_like(*marginals, **kwargs) Scores ngrams using a variant of mutual information.
pmi(*marginals) Scores ngrams by pointwise mutual information, as in Manning and Schutze 5.4.
poisson_stirling(*marginals) Scores ngrams using the Poisson-Stirling measure.
raw_freq(*marginals) Scores ngrams by their frequency
student_t(*marginals) Scores ngrams using Student’s t test with independence hypothesis for unigrams, as in Manning and Schutze 5.3.1.