nltk.word_tokenize()¶
-
nltk.word_tokenize(text, language='english')[source]¶ Return a tokenized copy of text, using NLTK’s recommended word tokenizer (currently
TreebankWordTokenizeralong withPunktSentenceTokenizerfor the specified language).Parameters: - text – text to split into sentences
- language – the model name in the Punkt corpus