nltk.TrigramAssocMeasures

class nltk.TrigramAssocMeasures[source]

A collection of trigram association measures. Each association measure is provided as a function with four arguments:

trigram_score_fn(n_iii,
                 (n_iix, n_ixi, n_xii),
                 (n_ixx, n_xix, n_xxi),
                 n_xxx)

The arguments constitute the marginals of a contingency table, counting the occurrences of particular events in a corpus. The letter i in the suffix refers to the appearance of the word in question, while x indicates the appearance of any word. Thus, for example: n_iii counts (w1, w2, w3), i.e. the trigram being scored n_ixx counts (w1, , *) n_xxx counts (, *, *), i.e. any trigram

Methods

chi_sq(*marginals) Scores ngrams using Pearson’s chi-square as in Manning and Schutze 5.3.3.
jaccard(*marginals) Scores ngrams using the Jaccard index.
likelihood_ratio(*marginals) Scores ngrams using likelihood ratios as in Manning and Schutze 5.3.4.
mi_like(*marginals, **kwargs) Scores ngrams using a variant of mutual information.
pmi(*marginals) Scores ngrams by pointwise mutual information, as in Manning and Schutze 5.4.
poisson_stirling(*marginals) Scores ngrams using the Poisson-Stirling measure.
raw_freq(*marginals) Scores ngrams by their frequency
student_t(*marginals) Scores ngrams using Student’s t test with independence hypothesis for unigrams, as in Manning and Schutze 5.3.1.