nltk.MWETokenizer.__init__

MWETokenizer.__init__(mwes=None, separator='_')[source]

Initialize the multi-word tokenizer with a list of expressions and a separator

Parameters:
  • mwes (list(list(str))) – A sequence of multi-word expressions to be merged, where each MWE is a sequence of strings.
  • separator (str) – String that should be inserted between words in a multi-word expression token. (Default is ‘_’)