nltk.word_tokenize()

nltk.word_tokenize(text, language='english')[source]

Return a tokenized copy of text, using NLTK’s recommended word tokenizer (currently TreebankWordTokenizer along with PunktSentenceTokenizer for the specified language).

Parameters:
  • text – text to split into sentences
  • language – the model name in the Punkt corpus