nltk.tokenize.MWETokenizer.tokenize

MWETokenizer.tokenize(text)[source]
Parameters:text (list(str)) – A list containing tokenized text
Returns:A list of the tokenized text with multi-words merged together
Return type:list(str)
Example:
>>> tokenizer = MWETokenizer([('hors', "d'oeuvre")], separator='+')
>>> tokenizer.tokenize("An hors d'oeuvre tonight, sir?".split())
['An', "hors+d'oeuvre", 'tonight,', 'sir?']