1.1.11. patsy.incr_dbuilder

patsy.incr_dbuilder(formula_like, data_iter_maker, eval_env=0, NA_action='drop')[source]

Construct a design matrix builder incrementally from a large data set.

Parameters:
  • formula_like – Similar to dmatrix(), except that explicit matrices are not allowed. Must be a formula string, a ModelDesc, a DesignInfo, or an object with a __patsy_get_model_desc__ method.
  • data_iter_maker – A zero-argument callable which returns an iterator over dict-like data objects. This must be a callable rather than a simple iterator because sufficiently complex formulas may require multiple passes over the data (e.g. if there are nested stateful transforms).
  • eval_env – Either a EvalEnvironment which will be used to look up any variables referenced in formula_like that cannot be found in data, or else a depth represented as an integer which will be passed to EvalEnvironment.capture(). eval_env=0 means to use the context of the function calling incr_dbuilder() for lookups. If calling this function from a library, you probably want eval_env=1, which means that variables should be resolved in your caller’s namespace.
  • NA_action – An NAAction object or string, used to determine what values count as ‘missing’ for purposes of determining the levels of categorical factors.
Returns:

A DesignInfo

Tip: for data_iter_maker, write a generator like:

def iter_maker():
    for data_chunk in my_data_store:
        yield data_chunk

and pass iter_maker (not iter_maker()).

New in version 0.2.0: The NA_action argument.