nltk.classify.MaxentClassifier.train¶
-
classmethod
MaxentClassifier.
train
(train_toks, algorithm=None, trace=3, encoding=None, labels=None, gaussian_prior_sigma=0, **cutoffs)[source]¶ Train a new maxent classifier based on the given corpus of training samples. This classifier will have its weights chosen to maximize entropy while remaining empirically consistent with the training corpus.
Return type: Returns: The new maxent classifier
Parameters: - train_toks (list) – Training data, represented as a list of pairs, the first member of which is a featureset, and the second of which is a classification label.
- algorithm (str) –
A case-insensitive string, specifying which algorithm should be used to train the classifier. The following algorithms are currently available.
- Iterative Scaling Methods: Generalized Iterative Scaling (
'GIS'
), Improved Iterative Scaling ('IIS'
) - External Libraries (requiring megam):
LM-BFGS algorithm, with training performed by Megam (
'megam'
)
The default algorithm is
'IIS'
. - Iterative Scaling Methods: Generalized Iterative Scaling (
- trace (int) – The level of diagnostic tracing output to produce. Higher values produce more verbose output.
- encoding (MaxentFeatureEncodingI) – A feature encoding, used to convert featuresets
into feature vectors. If none is specified, then a
BinaryMaxentFeatureEncoding
will be built based on the features that are attested in the training corpus. - labels (list(str)) – The set of possible labels. If none is given, then the set of all labels attested in the training data will be used instead.
- gaussian_prior_sigma – The sigma value for a gaussian
prior on model weights. Currently, this is supported by
megam
. For other algorithms, its value is ignored. - cutoffs –
Arguments specifying various conditions under which the training should be halted. (Some of the cutoff conditions are not supported by some algorithms.)
max_iter=v
: Terminate afterv
iterations.min_ll=v
: Terminate after the negative average log-likelihood drops underv
.min_lldelta=v
: Terminate if a single iteration improves log likelihood by less thanv
.