`nltk.SExprTokenizer`¶

class nltk.SExprTokenizer(parens='()', strict=True)[source]¶

A tokenizer that divides strings into s-expressions. An s-expresion can be either:

a parenthesized expression, including any nested parenthesized expressions, or

a sequence of non-whitespace non-parenthesis characters.

For example, the string (a (b c)) d e (f) consists of four s-expressions: (a (b c)), d, e, and (f).

By default, the characters ( and ) are treated as open and close parentheses, but alternative strings may be specified.

Parameters:	parens (str or list) – A two-element sequence specifying the open and close parentheses that should be used to find sexprs. This will typically be either a two-character string, or a list of two strings. strict – If true, then raise an exception when tokenizing an ill-formed sexpr.

Methods¶

`__init__`([parens, strict])
`span_tokenize`(s)	Identify the tokens using integer offsets `(start_i, end_i)`, where `s[start_i:end_i]` is the corresponding token.
`span_tokenize_sents`(strings)	Apply `self.span_tokenize()` to each element of `strings`.
`tokenize`(text)	Return a list of s-expressions extracted from text.
`tokenize_sents`(strings)	Apply `self.tokenize()` to each element of `strings`.