nltk.PCFG

class nltk.PCFG(start, productions, calculate_leftcorners=True)[source]

A probabilistic context-free grammar. A PCFG consists of a start state and a set of productions with probabilities. The set of terminals and nonterminals is implicitly specified by the productions.

PCFG productions use the ProbabilisticProduction class. PCFGs impose the constraint that the set of productions with any given left-hand-side must have probabilities that sum to 1 (allowing for a small margin of error).

If you need efficient key-based access to productions, you can use a subclass to implement it.

Variables:EPSILON – The acceptable margin of error for checking that productions with a given left-hand side have probabilities that sum to 1.

Methods

__init__(start, productions[, ...]) Create a new context-free grammar, from the given start state and set of ProbabilisticProductions.
check_coverage(tokens) Check whether the grammar rules cover the given list of tokens.
fromstring(input[, encoding]) Return a probabilistic PCFG corresponding to the input string(s).
is_binarised() Return True if all productions are at most binary.
is_chomsky_normal_form() Return True if the grammar is of Chomsky Normal Form, i.e.
is_flexible_chomsky_normal_form() Return True if all productions are of the forms A -> B C, A -> B, or A -> “s”.
is_leftcorner(cat, left) True if left is a leftcorner of cat, where left can be a terminal or a nonterminal.
is_lexical() Return True if all productions are lexicalised.
is_nonempty() Return True if there are no empty productions.
is_nonlexical() Return True if all lexical rules are “preterminals”, that is, unary rules which can be separated in a preprocessing step.
leftcorner_parents(cat) Return the set of all nonterminals for which the given category is a left corner.
leftcorners(cat) Return the set of all nonterminals that the given nonterminal can start with, including itself.
max_len() Return the right-hand side length of the longest grammar production.
min_len() Return the right-hand side length of the shortest grammar production.
productions([lhs, rhs, empty]) Return the grammar productions, filtered by the left-hand side or the first item in the right-hand side.
start() Return the start symbol of the grammar
unicode_repr()

Attributes

EPSILON