`nltk.string_span_tokenize()`¶

nltk.string_span_tokenize(s, sep)[source]¶

Return the offsets of the tokens in s, as a sequence of (start, end) tuples, by splitting the string at each occurrence of sep.

>>> from nltk.tokenize.util import string_span_tokenize
>>> s = '''Good muffins cost $3.88\nin New York.  Please buy me
... two of them.\n\nThanks.'''
>>> list(string_span_tokenize(s, " "))
[(0, 4), (5, 12), (13, 17), (18, 26), (27, 30), (31, 36), (37, 37),
(38, 44), (45, 48), (49, 55), (56, 58), (59, 73)]

Parameters:	s (str) – the string to be tokenized sep (str) – the token separator
Return type:	iter(tuple(int, int))

nltk.string_span_tokenize()¶

`nltk.string_span_tokenize()`¶