nltk.MWETokenizer.tokenize¶
-
MWETokenizer.
tokenize
(text)[source]¶ Parameters: text (list(str)) – A list containing tokenized text Returns: A list of the tokenized text with multi-words merged together Return type: list(str) Example: >>> tokenizer = MWETokenizer([('hors', "d'oeuvre")], separator='+') >>> tokenizer.tokenize("An hors d'oeuvre tonight, sir?".split()) ['An', "hors+d'oeuvre", 'tonight,', 'sir?']