`nltk.tokenize.load()`¶

nltk.tokenize.load(resource_url, format=u'auto', cache=True, verbose=False, logic_parser=None, fstruct_reader=None, encoding=None)[source]¶

Load a given resource from the NLTK data package. The following resource formats are currently supported:

pickle

json

yaml

cfg (context free grammars)

pcfg (probabilistic CFGs)

fcfg (feature-based CFGs)

fol (formulas of First Order Logic)

logic (Logical formulas to be parsed by the given logic_parser)

val (valuation of First Order Logic model)

text (the file contents as a unicode string)

raw (the raw file contents as a byte string)

If no format is specified, load() will attempt to determine a format based on the resource name’s file extension. If that fails, load() will raise a ValueError exception.

For all text formats (everything except pickle, json, yaml and raw), it tries to decode the raw contents using UTF-8, and if that doesn’t work, it tries with ISO-8859-1 (Latin-1), unless the encoding is specified.

Parameters:

resource_url (str) – A URL specifying where the resource should be loaded from. The default protocol is “nltk:”, which searches for the file in the the NLTK data package.
cache (bool) – If true, add this resource to a cache. If load() finds a resource in its cache, then it will return it from the cache rather than loading it. The cache uses weak references, so a resource wil automatically be expunged from the cache when no more objects are using it.
verbose (bool) – If true, print a message when loading a resource. Messages are not displayed when a resource is retrieved from the cache.
logic_parser (LogicParser) – The parser that will be used to parse logical expressions.
fstruct_reader (FeatStructReader) – The parser that will be used to parse the feature structure of an fcfg.
encoding (str) – the encoding of the input; only used for text formats.

nltk.tokenize.load()¶

`nltk.tokenize.load()`¶