nltk.BinaryMaxentFeatureEncoding
¶
-
class
nltk.
BinaryMaxentFeatureEncoding
(labels, mapping, unseen_features=False, alwayson_features=False)[source]¶ A feature encoding that generates vectors containing a binary joint-features of the form:
joint_feat(fs, l) = { 1 if (fs[fname] == fval) and (l == label){{ 0 otherwiseWhere
fname
is the name of an input-feature,fval
is a value for that input-feature, andlabel
is a label.Typically, these features are constructed based on a training corpus, using the
train()
method. This method will create one feature for each combination offname
,fval
, andlabel
that occurs at least once in the training corpus.The
unseen_features
parameter can be used to add “unseen-value features”, which are used whenever an input feature has a value that was not encountered in the training corpus. These features have the form:joint_feat(fs, l) = { 1 if is_unseen(fname, fs[fname]){ and l == label{{ 0 otherwiseWhere
is_unseen(fname, fval)
is true if the encoding does not contain any joint features that are true whenfs[fname]==fval
.The
alwayson_features
parameter can be used to add “always-on features”, which have the form:| joint_feat(fs, l) = { 1 if (l == label) | { | { 0 otherwise
These always-on features allow the maxent model to directly model the prior probabilities of each label.
Methods¶
__init__ (labels, mapping[, unseen_features, ...]) |
|
||
describe (f_id) |
|||
encode (featureset, label) |
|||
labels () |
|||
length () |
|||
train (train_toks[, count_cutoff, labels]) |
Construct and return new feature encoding, based on a given training corpus train_toks . |