nltk.BinaryMaxentFeatureEncoding¶
-
class
nltk.BinaryMaxentFeatureEncoding(labels, mapping, unseen_features=False, alwayson_features=False)[source]¶ A feature encoding that generates vectors containing a binary joint-features of the form:
joint_feat(fs, l) = { 1 if (fs[fname] == fval) and (l == label){{ 0 otherwiseWhere
fnameis the name of an input-feature,fvalis a value for that input-feature, andlabelis a label.Typically, these features are constructed based on a training corpus, using the
train()method. This method will create one feature for each combination offname,fval, andlabelthat occurs at least once in the training corpus.The
unseen_featuresparameter can be used to add “unseen-value features”, which are used whenever an input feature has a value that was not encountered in the training corpus. These features have the form:joint_feat(fs, l) = { 1 if is_unseen(fname, fs[fname]){ and l == label{{ 0 otherwiseWhere
is_unseen(fname, fval)is true if the encoding does not contain any joint features that are true whenfs[fname]==fval.The
alwayson_featuresparameter can be used to add “always-on features”, which have the form:| joint_feat(fs, l) = { 1 if (l == label) | { | { 0 otherwiseThese always-on features allow the maxent model to directly model the prior probabilities of each label.
Methods¶
__init__(labels, mapping[, unseen_features, ...]) |
|
||
describe(f_id) |
|||
encode(featureset, label) |
|||
labels() |
|||
length() |
|||
train(train_toks[, count_cutoff, labels]) |
Construct and return new feature encoding, based on a given training corpus train_toks. |