__init__ ([mwes, separator]) |
Initialize the multi-word tokenizer with a list of expressions and a |
add_mwe (mwe) |
Add a multi-word expression to the lexicon (stored as a word trie) |
span_tokenize (s) |
Identify the tokens using integer offsets (start_i, end_i) , where s[start_i:end_i] is the corresponding token. |
span_tokenize_sents (strings) |
Apply self.span_tokenize() to each element of strings . |
tokenize (text) |
param text: | A list containing tokenized text |
|
tokenize_sents (strings) |
Apply self.tokenize() to each element of strings . |