nltk.tokenize.WhitespaceTokenizer

class nltk.tokenize.WhitespaceTokenizer[source]

Tokenize a string on whitespace (space, tab, newline). In general, users should use the string split() method instead.

>>> from nltk.tokenize import WhitespaceTokenizer
>>> s = "Good muffins cost $3.88\nin New York.  Please buy me\ntwo of them.\n\nThanks."
>>> WhitespaceTokenizer().tokenize(s)
['Good', 'muffins', 'cost', '$3.88', 'in', 'New', 'York.',
'Please', 'buy', 'me', 'two', 'of', 'them.', 'Thanks.']

Methods

__init__()
span_tokenize(text)
span_tokenize_sents(strings) Apply self.span_tokenize() to each element of strings.
tokenize(text)
tokenize_sents(strings) Apply self.tokenize() to each element of strings.
unicode_repr()