1.1.10. patsy.dmatrix¶
-
patsy.
dmatrix
(formula_like, data={}, eval_env=0, NA_action='drop', return_type='matrix')[source]¶ Construct a single design matrix given a formula_like and data.
Parameters: - formula_like – An object that can be used to construct a design matrix. See below.
- data – A dict-like object that can be used to look up variables referenced in formula_like.
- eval_env – Either a
EvalEnvironment
which will be used to look up any variables referenced in formula_like that cannot be found in data, or else a depth represented as an integer which will be passed toEvalEnvironment.capture()
.eval_env=0
means to use the context of the function callingdmatrix()
for lookups. If calling this function from a library, you probably wanteval_env=1
, which means that variables should be resolved in your caller’s namespace. - NA_action – What to do with rows that contain missing values. You can
"drop"
them,"raise"
an error, or for customization, pass anNAAction
object. SeeNAAction
for details on what values count as ‘missing’ (and how to alter this). - return_type – Either
"matrix"
or"dataframe"
. See below.
The formula_like can take a variety of forms. You can use any of the following:
- (The most common option) A formula string like
"x1 + x2"
(fordmatrix()
) or"y ~ x1 + x2"
(fordmatrices()
). For details see formulas. - A
ModelDesc
, which is a Python object representation of a formula. See formulas and expert-model-specification for details. - A
DesignInfo
. - An object that has a method called
__patsy_get_model_desc__()
. For details see expert-model-specification. - A numpy array_like (for
dmatrix()
) or a tuple (array_like, array_like) (fordmatrices()
). These will have metadata added, representation normalized, and then be returned directly. In this case data and eval_env are ignored. There is special handling for two cases:DesignMatrix
objects will have theirDesignInfo
preserved. This allows you to set up custom column names and term information even if you aren’t using the rest of the patsy machinery.pandas.DataFrame
orpandas.Series
objects will have their (row) indexes checked. If two are passed in, their indexes must be aligned. Ifreturn_type="dataframe"
, then their indexes will be preserved on the output.
Regardless of the input, the return type is always either:
- A
DesignMatrix
, ifreturn_type="matrix"
(the default) - A
pandas.DataFrame
, ifreturn_type="dataframe"
.
The actual contents of the design matrix is identical in both cases, and in both cases a
DesignInfo
object will be available in a.design_info
attribute on the return value. However, forreturn_type="dataframe"
, any pandas indexes on the input (either in data or directly passed through formula_like) will be preserved, which may be useful for e.g. time-series models.New in version 0.2.0: The
NA_action
argument.