nltk.tokenize.load()

nltk.tokenize.load(resource_url, format=u'auto', cache=True, verbose=False, logic_parser=None, fstruct_reader=None, encoding=None)[source]

Load a given resource from the NLTK data package. The following resource formats are currently supported:

  • pickle
  • json
  • yaml
  • cfg (context free grammars)
  • pcfg (probabilistic CFGs)
  • fcfg (feature-based CFGs)
  • fol (formulas of First Order Logic)
  • logic (Logical formulas to be parsed by the given logic_parser)
  • val (valuation of First Order Logic model)
  • text (the file contents as a unicode string)
  • raw (the raw file contents as a byte string)

If no format is specified, load() will attempt to determine a format based on the resource name’s file extension. If that fails, load() will raise a ValueError exception.

For all text formats (everything except pickle, json, yaml and raw), it tries to decode the raw contents using UTF-8, and if that doesn’t work, it tries with ISO-8859-1 (Latin-1), unless the encoding is specified.

Parameters:
  • resource_url (str) – A URL specifying where the resource should be loaded from. The default protocol is “nltk:”, which searches for the file in the the NLTK data package.
  • cache (bool) – If true, add this resource to a cache. If load() finds a resource in its cache, then it will return it from the cache rather than loading it. The cache uses weak references, so a resource wil automatically be expunged from the cache when no more objects are using it.
  • verbose (bool) – If true, print a message when loading a resource. Messages are not displayed when a resource is retrieved from the cache.
  • logic_parser (LogicParser) – The parser that will be used to parse logical expressions.
  • fstruct_reader (FeatStructReader) – The parser that will be used to parse the feature structure of an fcfg.
  • encoding (str) – the encoding of the input; only used for text formats.