>>> columns = pd.MultiIndex.from_tuples([('A', 'cat'), ('B', 'dog'),
... ('B', 'cat'), ('A', 'dog')],
... names=['exp', 'animal'])
>>> index = pd.MultiIndex.from_product([('bar', 'baz', 'foo', 'qux'),
... ('one', 'two')],
... names=['first', 'second'])
>>> df = pd.DataFrame(np.random.randn(8, 4), index=index, columns=columns)
7.4 Combining with stats and GroupBy
It should be no shock that combining pivot
/ stack
/ unstack
with
GroupBy and the basic Series and DataFrame statistical functions can produce
some very expressive and fast data manipulations.
In [1]: df
Out[1]:
exp A B A
animal cat dog cat dog
first second
bar one 0.469112 -0.282863 -1.509059 -1.135632
two 1.212112 -0.173215 0.119209 -1.044236
baz one -0.861849 -2.104569 -0.494929 1.071804
two 0.721555 -0.706771 -1.039575 0.271860
foo one -0.424972 0.567020 0.276232 -1.087401
two -0.673690 0.113648 -1.478427 0.524988
qux one 0.404705 0.577046 -1.715002 -1.039268
two -0.370647 -1.157892 -1.344312 0.844885
In [2]: df.stack().mean(1).unstack()
Out[2]:
animal cat dog
first second
bar one -0.519973 -0.709248
two 0.665660 -0.608725
baz one -0.678389 -0.516383
two -0.159010 -0.217456
foo one -0.074370 -0.260190
two -1.076058 0.319318
qux one -0.655148 -0.231111
two -0.857479 -0.156504
# same result, another way
In [3]: df.groupby(level=1, axis=1).mean()
Out[3]:
animal cat dog
first second
bar one -0.519973 -0.709248
two 0.665660 -0.608725
baz one -0.678389 -0.516383
two -0.159010 -0.217456
foo one -0.074370 -0.260190
two -1.076058 0.319318
qux one -0.655148 -0.231111
two -0.857479 -0.156504
In [4]: df.stack().groupby(level=1).mean()
Out[4]:
exp A B
second
one -0.325438 -0.585766
two 0.185853 -0.708417
In [5]: df.mean().unstack(0)
Out[5]:
exp A B
animal
cat 0.059541 -0.898233
dog -0.199125 -0.395949