5.7 Operations

Apart from Series.min(), Series.max() and Series.mode(), the following operations are possible with categorical data:

Series methods like Series.value_counts() will use all categories, even if some categories are not present in the data:

In [1]: s = pd.Series(pd.Categorical(["a","b","c","c"], categories=["c","a","b","d"]))

In [2]: s.value_counts()
Out[2]: 
c    2
b    1
a    1
d    0
dtype: int64

Groupby will also show “unused” categories:

In [3]: cats = pd.Categorical(["a","b","b","b","c","c","c"], categories=["a","b","c","d"])

In [4]: df = pd.DataFrame({"cats":cats,"values":[1,2,2,2,3,4,5]})

In [5]: df.groupby("cats").mean()
Out[5]: 
      values
cats        
a        1.0
b        2.0
c        4.0
d        NaN

In [6]: cats2 = pd.Categorical(["a","a","b","b"], categories=["a","b","c"])

In [7]: df2 = pd.DataFrame({"cats":cats2,"B":["c","d","c","d"], "values":[1,2,3,4]})

In [8]: df2.groupby(["cats","B"]).mean()
Out[8]: 
        values
cats B        
a    c     1.0
     d     2.0
b    c     3.0
     d     4.0
c    c     NaN
     d     NaN

Pivot tables:

In [9]: raw_cat = pd.Categorical(["a","a","b","b"], categories=["a","b","c"])

In [10]: df = pd.DataFrame({"A":raw_cat,"B":["c","d","c","d"], "values":[1,2,3,4]})

In [11]: pd.pivot_table(df, values='values', index=['A', 'B'])
Out[11]: 
A  B
a  c    1.0
   d    2.0
b  c    3.0
   d    4.0
c  c    NaN
   d    NaN
Name: values, dtype: float64