nltk.classify.TypedMaxentFeatureEncoding

class nltk.classify.TypedMaxentFeatureEncoding(labels, mapping, unseen_features=False, alwayson_features=False)[source]

A feature encoding that generates vectors containing integer, float and binary joint-features of the form:

Binary (for string and boolean features):

joint_feat(fs, l) = { 1 if (fs[fname] == fval) and (l == label)
{
{ 0 otherwise

Value (for integer and float features):

joint_feat(fs, l) = { fval if (fs[fname] == type(fval))
{ and (l == label)
{
{ not encoded otherwise

Where fname is the name of an input-feature, fval is a value for that input-feature, and label is a label.

Typically, these features are constructed based on a training corpus, using the train() method.

For string and boolean features [type(fval) not in (int, float)] this method will create one feature for each combination of fname, fval, and label that occurs at least once in the training corpus.

For integer and float features [type(fval) in (int, float)] this method will create one feature for each combination of fname and label that occurs at least once in the training corpus.

For binary features the unseen_features parameter can be used to add “unseen-value features”, which are used whenever an input feature has a value that was not encountered in the training corpus. These features have the form:

joint_feat(fs, l) = { 1 if is_unseen(fname, fs[fname])
{ and l == label
{
{ 0 otherwise

Where is_unseen(fname, fval) is true if the encoding does not contain any joint features that are true when fs[fname]==fval.

The alwayson_features parameter can be used to add “always-on features”, which have the form:

joint_feat(fs, l) = { 1 if (l == label)
{
{ 0 otherwise

These always-on features allow the maxent model to directly model the prior probabilities of each label.

Methods

__init__(labels, mapping[, unseen_features, ...])
param labels:A list of the “known labels” for this encoding.
describe(f_id)
encode(featureset, label)
labels()
length()
train(train_toks[, count_cutoff, labels]) Construct and return new feature encoding, based on a given training corpus train_toks.