nltk.tokenize.MWETokenizer.tokenize¶

MWETokenizer.tokenize(text)[source]¶

Parameters:	text (list(str)) – A list containing tokenized text
Returns:	A list of the tokenized text with multi-words merged together
Return type:	list(str)
Example:

>>> tokenizer = MWETokenizer([('hors', "d'oeuvre")], separator='+')
>>> tokenizer.tokenize("An hors d'oeuvre tonight, sir?".split())
['An', "hors+d'oeuvre", 'tonight,', 'sir?']