1.1.11. patsy.incr_dbuilder¶
-
patsy.
incr_dbuilder
(formula_like, data_iter_maker, eval_env=0, NA_action='drop')[source]¶ Construct a design matrix builder incrementally from a large data set.
Parameters: - formula_like – Similar to
dmatrix()
, except that explicit matrices are not allowed. Must be a formula string, aModelDesc
, aDesignInfo
, or an object with a__patsy_get_model_desc__
method. - data_iter_maker – A zero-argument callable which returns an iterator over dict-like data objects. This must be a callable rather than a simple iterator because sufficiently complex formulas may require multiple passes over the data (e.g. if there are nested stateful transforms).
- eval_env – Either a
EvalEnvironment
which will be used to look up any variables referenced in formula_like that cannot be found in data, or else a depth represented as an integer which will be passed toEvalEnvironment.capture()
.eval_env=0
means to use the context of the function callingincr_dbuilder()
for lookups. If calling this function from a library, you probably wanteval_env=1
, which means that variables should be resolved in your caller’s namespace. - NA_action – An
NAAction
object or string, used to determine what values count as ‘missing’ for purposes of determining the levels of categorical factors.
Returns: Tip: for data_iter_maker, write a generator like:
def iter_maker(): for data_chunk in my_data_store: yield data_chunk
and pass iter_maker (not iter_maker()).
New in version 0.2.0: The
NA_action
argument.- formula_like – Similar to