nltk.HiddenMarkovModelTagger

class nltk.HiddenMarkovModelTagger(symbols, states, transitions, outputs, priors, transform=<function _identity>)[source]

Hidden Markov model class, a generative model for labelling sequence data. These models define the joint probability of a sequence of symbols and their labels (state transitions) as the product of the starting state probability, the probability of each state transition, and the probability of each observation being generated from each state. This is described in more detail in the module documentation.

This implementation is based on the HMM description in Chapter 8, Huang, Acero and Hon, Spoken Language Processing and includes an extension for training shallow HMM parsers or specialized HMMs as in Molina et. al, 2002. A specialized HMM modifies training data by applying a specialization function to create a new training set that is more appropriate for sequential tagging with an HMM. A typical use case is chunking.

Parameters:
  • symbols (seq of any) – the set of output symbols (alphabet)
  • states (seq of any) – a set of states representing state space
  • transitions (ConditionalProbDistI) – transition probabilities; Pr(s_i | s_j) is the probability of transition from state i given the model is in state_j
  • outputs (ConditionalProbDistI) – output probabilities; Pr(o_k | s_i) is the probability of emitting symbol k when entering state i
  • priors (ProbDistI) – initial state distribution; Pr(s_i) is the probability of starting in state i
  • transform (callable) – an optional function for transforming training instances, defaults to the identity function.

Methods

__init__(symbols, states, transitions, ...)
best_path(unlabeled_sequence) Returns the state sequence of the optimal (most probable) path through the HMM.
best_path_simple(unlabeled_sequence) Returns the state sequence of the optimal (most probable) path through the HMM.
entropy(unlabeled_sequence) Returns the entropy over labellings of the given sequence.
evaluate(gold) Score the accuracy of the tagger against the gold standard.
log_probability(sequence) Returns the log-probability of the given symbol sequence.
point_entropy(unlabeled_sequence) Returns the pointwise entropy over the possible states at each position in the chain, given the observation sequence.
probability(sequence) Returns the probability of the given symbol sequence.
random_sample(rng, length) Randomly sample the HMM to generate a sentence of a given length.
reset_cache()
tag(unlabeled_sequence) Tags the sequence with the highest probability state sequence.
tag_sents(sentences) Apply self.tag() to each element of sentences.
test(test_sequence[, verbose]) Tests the HiddenMarkovModelTagger instance.
train(labeled_sequence[, test_sequence, ...]) Train a new HiddenMarkovModelTagger using the given labeled and unlabeled training instances.
unicode_repr()