nltk.tokenize.MWETokenizer.add_mwe¶
-
MWETokenizer.
add_mwe
(mwe)[source]¶ Add a multi-word expression to the lexicon (stored as a word trie)
We use
util.Trie
to represent the trie. Its form is a dict of dicts. The key True marks the end of a valid MWE.Parameters: mwe (tuple(str) or list(str)) – The multi-word expression we’re adding into the word trie Example: >>> tokenizer = MWETokenizer() >>> tokenizer.add_mwe(('a', 'b')) >>> tokenizer.add_mwe(('a', 'b', 'c')) >>> tokenizer.add_mwe(('a', 'x')) >>> expected = {'a': {'x': {True: None}, 'b': {True: None, 'c': {True: None}}}} >>> tokenizer._mwes.as_dict() == expected True