nltk.CRFTagger.__init__

CRFTagger.__init__(feature_func=None, verbose=False, training_opt={})[source]

Initialize the CRFSuite tagger :param feature_func: The function that extracts features for each token of a sentence. This function should take 2 parameters: tokens and index which extract features at index position from tokens list. See the build in _get_features function for more detail. :param verbose: output the debugging messages during training. :type verbose: boolean :param training_opt: python-crfsuite training options :type training_opt : dictionary

Set of possible training options (using LBFGS training algorithm).

‘feature.minfreq’ : The minimum frequency of features. ‘feature.possible_states’ : Force to generate possible state features. ‘feature.possible_transitions’ : Force to generate possible transition features. ‘c1’ : Coefficient for L1 regularization. ‘c2’ : Coefficient for L2 regularization. ‘max_iterations’ : The maximum number of iterations for L-BFGS optimization. ‘num_memories’ : The number of limited memories for approximating the inverse hessian matrix. ‘epsilon’ : Epsilon for testing the convergence of the objective. ‘period’ : The duration of iterations to test the stopping criterion. ‘delta’ : The threshold for the stopping criterion; an L-BFGS iteration stops when the

improvement of the log likelihood over the last ${period} iterations is no greater than this threshold.
‘linesearch’ : The line search algorithm used in L-BFGS updates:
{ ‘MoreThuente’: More and Thuente’s method,
‘Backtracking’: Backtracking method with regular Wolfe condition, ‘StrongBacktracking’: Backtracking method with strong Wolfe condition

}

‘max_linesearch’ : The maximum number of trials for the line search algorithm.