nltk.WhitespaceTokenizer¶
-
class
nltk.WhitespaceTokenizer[source]¶ Tokenize a string on whitespace (space, tab, newline). In general, users should use the string
split()method instead.>>> from nltk.tokenize import WhitespaceTokenizer >>> s = "Good muffins cost $3.88\nin New York. Please buy me\ntwo of them.\n\nThanks." >>> WhitespaceTokenizer().tokenize(s) ['Good', 'muffins', 'cost', '$3.88', 'in', 'New', 'York.', 'Please', 'buy', 'me', 'two', 'of', 'them.', 'Thanks.']
Methods¶
__init__() |
|
span_tokenize(text) |
|
span_tokenize_sents(strings) |
Apply self.span_tokenize() to each element of strings. |
tokenize(text) |
|
tokenize_sents(strings) |
Apply self.tokenize() to each element of strings. |
unicode_repr() |