nltk.CRFTagger.__init__¶
-
CRFTagger.
__init__
(feature_func=None, verbose=False, training_opt={})[source]¶ Initialize the CRFSuite tagger :param feature_func: The function that extracts features for each token of a sentence. This function should take 2 parameters: tokens and index which extract features at index position from tokens list. See the build in _get_features function for more detail. :param verbose: output the debugging messages during training. :type verbose: boolean :param training_opt: python-crfsuite training options :type training_opt : dictionary
- Set of possible training options (using LBFGS training algorithm).
‘feature.minfreq’ : The minimum frequency of features. ‘feature.possible_states’ : Force to generate possible state features. ‘feature.possible_transitions’ : Force to generate possible transition features. ‘c1’ : Coefficient for L1 regularization. ‘c2’ : Coefficient for L2 regularization. ‘max_iterations’ : The maximum number of iterations for L-BFGS optimization. ‘num_memories’ : The number of limited memories for approximating the inverse hessian matrix. ‘epsilon’ : Epsilon for testing the convergence of the objective. ‘period’ : The duration of iterations to test the stopping criterion. ‘delta’ : The threshold for the stopping criterion; an L-BFGS iteration stops when the
improvement of the log likelihood over the last ${period} iterations is no greater than this threshold.- ‘linesearch’ : The line search algorithm used in L-BFGS updates:
- { ‘MoreThuente’: More and Thuente’s method,
- ‘Backtracking’: Backtracking method with regular Wolfe condition, ‘StrongBacktracking’: Backtracking method with strong Wolfe condition
}
‘max_linesearch’ : The maximum number of trials for the line search algorithm.