3.1.1. patsy.categorical.C

patsy.categorical.C(data, contrast=None, levels=None)[source]

Marks some data as being categorical, and specifies how to interpret it.

This is used for three reasons:

  • To explicitly mark some data as categorical. For instance, integer data is by default treated as numerical. If you have data that is stored using an integer type, but where you want patsy to treat each different value as a different level of a categorical factor, you can wrap it in a call to C to accomplish this. E.g., compare:

    dmatrix("a", {"a": [1, 2, 3]})
    dmatrix("C(a)", {"a": [1, 2, 3]})
    
  • To explicitly set the levels or override the default level ordering for categorical data, e.g.:

    dmatrix("C(a, levels=["a2", "a1"])", balanced(a=2))
    
  • To override the default coding scheme for categorical data. The contrast argument can be any of:

    • A ContrastMatrix object
    • A simple 2d ndarray (which is treated the same as a ContrastMatrix object except that you can’t specify column names)
    • An object with methods called code_with_intercept and code_without_intercept, like the built-in contrasts (Treatment, Diff, Poly, etc.). See categorical-coding for more details.
    • A callable that returns one of the above.