nltk.BinaryMaxentFeatureEncoding

class nltk.BinaryMaxentFeatureEncoding(labels, mapping, unseen_features=False, alwayson_features=False)[source]

A feature encoding that generates vectors containing a binary joint-features of the form:

joint_feat(fs, l) = { 1 if (fs[fname] == fval) and (l == label)
{
{ 0 otherwise

Where fname is the name of an input-feature, fval is a value for that input-feature, and label is a label.

Typically, these features are constructed based on a training corpus, using the train() method. This method will create one feature for each combination of fname, fval, and label that occurs at least once in the training corpus.

The unseen_features parameter can be used to add “unseen-value features”, which are used whenever an input feature has a value that was not encountered in the training corpus. These features have the form:

joint_feat(fs, l) = { 1 if is_unseen(fname, fs[fname])
{ and l == label
{
{ 0 otherwise

Where is_unseen(fname, fval) is true if the encoding does not contain any joint features that are true when fs[fname]==fval.

The alwayson_features parameter can be used to add “always-on features”, which have the form:

|  joint_feat(fs, l) = { 1 if (l == label)
|                      {
|                      { 0 otherwise

These always-on features allow the maxent model to directly model the prior probabilities of each label.

Methods

__init__(labels, mapping[, unseen_features, ...])
param labels:A list of the “known labels” for this encoding.
describe(f_id)
encode(featureset, label)
labels()
length()
train(train_toks[, count_cutoff, labels]) Construct and return new feature encoding, based on a given training corpus train_toks.