nltk.BigramAssocMeasures
¶
-
class
nltk.
BigramAssocMeasures
[source]¶ A collection of bigram association measures. Each association measure is provided as a function with three arguments:
bigram_score_fn(n_ii, (n_ix, n_xi), n_xx)
The arguments constitute the marginals of a contingency table, counting the occurrences of particular events in a corpus. The letter i in the suffix refers to the appearance of the word in question, while x indicates the appearance of any word. Thus, for example:
This may be shown with respect to a contingency table:
w1 ~w1 ------ ------ w2 | n_ii | n_oi | = n_xi ------ ------ ~w2 | n_io | n_oo | ------ ------ = n_ix TOTAL = n_xx
Methods¶
chi_sq (n_ii, n_ix_xi_tuple, n_xx) |
Scores bigrams using chi-square, i.e. |
dice (n_ii, n_ix_xi_tuple, n_xx) |
Scores bigrams using Dice’s coefficient. |
fisher (*marginals) |
Scores bigrams using Fisher’s Exact Test (Pedersen 1996). |
jaccard (*marginals) |
Scores ngrams using the Jaccard index. |
likelihood_ratio (*marginals) |
Scores ngrams using likelihood ratios as in Manning and Schutze 5.3.4. |
mi_like (*marginals, **kwargs) |
Scores ngrams using a variant of mutual information. |
phi_sq (*marginals) |
Scores bigrams using phi-square, the square of the Pearson correlation coefficient. |
pmi (*marginals) |
Scores ngrams by pointwise mutual information, as in Manning and Schutze 5.4. |
poisson_stirling (*marginals) |
Scores ngrams using the Poisson-Stirling measure. |
raw_freq (*marginals) |
Scores ngrams by their frequency |
student_t (*marginals) |
Scores ngrams using Student’s t test with independence hypothesis for unigrams, as in Manning and Schutze 5.3.1. |