classmethod TypedMaxentFeatureEncoding.train(train_toks, count_cutoff=0, labels=None, **options)[source]

Construct and return new feature encoding, based on a given training corpus train_toks. See the class description TypedMaxentFeatureEncoding for a description of the joint-features that will be included in this encoding.

Note: recognized feature values types are (int, float), over types are interpreted as regular binary features.

  • train_toks (list(tuple(dict, str))) – Training data, represented as a list of pairs, the first member of which is a feature dictionary, and the second of which is a classification label.
  • count_cutoff (int) – A cutoff value that is used to discard rare joint-features. If a joint-feature’s value is 1 fewer than count_cutoff times in the training corpus, then that joint-feature is not included in the generated encoding.
  • labels (list) – A list of labels that should be used by the classifier. If not specified, then the set of labels attested in train_toks will be used.
  • options – Extra parameters for the constructor, such as unseen_features and alwayson_features.