nltk.chunk.RegexpParser

class nltk.chunk.RegexpParser(grammar, root_label=u'S', loop=1, trace=0)[source]

A grammar based chunk parser. chunk.RegexpParser uses a set of regular expression patterns to specify the behavior of the parser. The chunking of the text is encoded using a ChunkString, and each rule acts by modifying the chunking in the ChunkString. The rules are all implemented using regular expression matching and substitution.

A grammar contains one or more clauses in the following form:

NP:
  {<DT|JJ>}          # chunk determiners and adjectives
  }<[\.VI].*>+{      # chink any tag beginning with V, I, or .
  <.*>}{<DT>         # split a chunk at a determiner
  <DT|JJ>{}<NN.*>    # merge chunk ending with det/adj
                     # with one starting with a noun

The patterns of a clause are executed in order. An earlier pattern may introduce a chunk boundary that prevents a later pattern from executing. Sometimes an individual pattern will match on multiple, overlapping extents of the input. As with regular expression substitution more generally, the chunker will identify the first match possible, then continue looking for matches after this one has ended.

The clauses of a grammar are also executed in order. A cascaded chunk parser is one having more than one clause. The maximum depth of a parse tree created by this chunk parser is the same as the number of clauses in the grammar.

When tracing is turned on, the comment portion of a line is displayed each time the corresponding pattern is applied.

Variables:
  • _start – The start symbol of the grammar (the root node of resulting trees)
  • _stages – The list of parsing stages corresponding to the grammar

Methods

__init__(grammar[, root_label, loop, trace]) Create a new chunk parser, from the given start state and set of chunk patterns.
evaluate(gold) Score the accuracy of the chunker against the gold standard.
grammar()
return:The grammar used by this parser.
parse(chunk_struct[, trace]) Apply the chunk parser to this input.
parse_all(sent, *args, **kwargs)
rtype:list(Tree)
parse_one(sent, *args, **kwargs)
rtype:Tree or None
parse_sents(sents, *args, **kwargs) Apply self.parse() to each element of sents.
unicode_repr()
return:a concise string representation of this chunk.RegexpParser.