nltk.classify.TypedMaxentFeatureEncoding
¶
-
class
nltk.classify.
TypedMaxentFeatureEncoding
(labels, mapping, unseen_features=False, alwayson_features=False)[source]¶ A feature encoding that generates vectors containing integer, float and binary joint-features of the form:
Binary (for string and boolean features):
joint_feat(fs, l) = { 1 if (fs[fname] == fval) and (l == label){{ 0 otherwiseValue (for integer and float features):
joint_feat(fs, l) = { fval if (fs[fname] == type(fval)){ and (l == label){{ not encoded otherwiseWhere
fname
is the name of an input-feature,fval
is a value for that input-feature, andlabel
is a label.Typically, these features are constructed based on a training corpus, using the
train()
method.For string and boolean features [type(fval) not in (int, float)] this method will create one feature for each combination of
fname
,fval
, andlabel
that occurs at least once in the training corpus.For integer and float features [type(fval) in (int, float)] this method will create one feature for each combination of
fname
andlabel
that occurs at least once in the training corpus.For binary features the
unseen_features
parameter can be used to add “unseen-value features”, which are used whenever an input feature has a value that was not encountered in the training corpus. These features have the form:joint_feat(fs, l) = { 1 if is_unseen(fname, fs[fname]){ and l == label{{ 0 otherwiseWhere
is_unseen(fname, fval)
is true if the encoding does not contain any joint features that are true whenfs[fname]==fval
.The
alwayson_features
parameter can be used to add “always-on features”, which have the form:joint_feat(fs, l) = { 1 if (l == label){{ 0 otherwiseThese always-on features allow the maxent model to directly model the prior probabilities of each label.
Methods¶
__init__ (labels, mapping[, unseen_features, ...]) |
|
||
describe (f_id) |
|||
encode (featureset, label) |
|||
labels () |
|||
length () |
|||
train (train_toks[, count_cutoff, labels]) |
Construct and return new feature encoding, based on a given training corpus train_toks . |