8 Reshaping
See the sections on Hierarchical Indexing and Reshaping.
8.1 Stack
In [1]: tuples = list(zip(*[['bar', 'bar', 'baz', 'baz',
...: 'foo', 'foo', 'qux', 'qux'],
...: ['one', 'two', 'one', 'two',
...: 'one', 'two', 'one', 'two']]))
...:
In [2]: index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
In [3]: df = pd.DataFrame(np.random.randn(8, 2), index=index, columns=['A', 'B'])
In [4]: df2 = df[:4]
In [5]: df2
Out[5]:
A B
first second
bar one 0.469112 -0.282863
two -1.509059 -1.135632
baz one 1.212112 -0.173215
two 0.119209 -1.044236
The stack()
method “compresses” a level in the DataFrame’s
columns.
In [6]: stacked = df2.stack()
In [7]: stacked
Out[7]:
first second
bar one A 0.469112
B -0.282863
two A -1.509059
B -1.135632
baz one A 1.212112
B -0.173215
two A 0.119209
B -1.044236
dtype: float64
With a “stacked” DataFrame or Series (having a MultiIndex
as the
index
), the inverse operation of stack()
is
unstack()
, which by default unstacks the last level:
In [8]: stacked.unstack()
Out[8]:
A B
first second
bar one 0.469112 -0.282863
two -1.509059 -1.135632
baz one 1.212112 -0.173215
two 0.119209 -1.044236
In [9]: stacked.unstack(1)
Out[9]:
second one two
first
bar A 0.469112 -1.509059
B -0.282863 -1.135632
baz A 1.212112 0.119209
B -0.173215 -1.044236
In [10]: stacked.unstack(0)
Out[10]:
first bar baz
second
one A 0.469112 1.212112
B -0.282863 -0.173215
two A -1.509059 0.119209
B -1.135632 -1.044236
8.2 Pivot Tables
See the section on Pivot Tables.
In [11]: df = pd.DataFrame({'A' : ['one', 'one', 'two', 'three'] * 3,
....: 'B' : ['A', 'B', 'C'] * 4,
....: 'C' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 2,
....: 'D' : np.random.randn(12),
....: 'E' : np.random.randn(12)})
....:
In [12]: df
Out[12]:
A B C D E
0 one A foo -0.424972 -0.370647
1 one B foo 0.567020 -1.157892
2 two C foo 0.276232 -1.344312
3 three A bar -1.087401 0.844885
.. ... .. ... ... ...
8 one C foo 0.404705 0.357021
9 one A bar 0.577046 -0.674600
10 two B bar -1.715002 -1.776904
11 three C bar -1.039268 -0.968914
[12 rows x 5 columns]
We can produce pivot tables from this data very easily:
In [13]: pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C'])
Out[13]:
C bar foo
A B
one A 0.577046 -0.424972
B -0.673690 0.567020
C 0.113648 0.404705
three A -1.087401 NaN
... ... ...
C -1.039268 NaN
two A NaN -1.478427
B -1.715002 NaN
C NaN 0.276232
[9 rows x 2 columns]