nltk.tokenize.word_tokenize()
¶
-
nltk.tokenize.
word_tokenize
(text, language='english')[source]¶ Return a tokenized copy of text, using NLTK’s recommended word tokenizer (currently
TreebankWordTokenizer
along withPunktSentenceTokenizer
for the specified language).Parameters: - text – text to split into sentences
- language – the model name in the Punkt corpus