8 Reshaping

See the sections on Hierarchical Indexing and Reshaping.

8.1 Stack

In [1]: tuples = list(zip(*[['bar', 'bar', 'baz', 'baz',
   ...:                      'foo', 'foo', 'qux', 'qux'],
   ...:                     ['one', 'two', 'one', 'two',
   ...:                      'one', 'two', 'one', 'two']]))
   ...: 

In [2]: index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])

In [3]: df = pd.DataFrame(np.random.randn(8, 2), index=index, columns=['A', 'B'])

In [4]: df2 = df[:4]

In [5]: df2
Out[5]: 
                     A         B
first second                    
bar   one     0.469112 -0.282863
      two    -1.509059 -1.135632
baz   one     1.212112 -0.173215
      two     0.119209 -1.044236

The stack() method “compresses” a level in the DataFrame’s columns.

In [6]: stacked = df2.stack()

In [7]: stacked
Out[7]: 
first  second   
bar    one     A    0.469112
               B   -0.282863
       two     A   -1.509059
               B   -1.135632
baz    one     A    1.212112
               B   -0.173215
       two     A    0.119209
               B   -1.044236
dtype: float64

With a “stacked” DataFrame or Series (having a MultiIndex as the index), the inverse operation of stack() is unstack(), which by default unstacks the last level:

In [8]: stacked.unstack()
Out[8]: 
                     A         B
first second                    
bar   one     0.469112 -0.282863
      two    -1.509059 -1.135632
baz   one     1.212112 -0.173215
      two     0.119209 -1.044236

In [9]: stacked.unstack(1)
Out[9]: 
second        one       two
first                      
bar   A  0.469112 -1.509059
      B -0.282863 -1.135632
baz   A  1.212112  0.119209
      B -0.173215 -1.044236

In [10]: stacked.unstack(0)
Out[10]: 
first          bar       baz
second                      
one    A  0.469112  1.212112
       B -0.282863 -0.173215
two    A -1.509059  0.119209
       B -1.135632 -1.044236

8.2 Pivot Tables

See the section on Pivot Tables.

In [11]: df = pd.DataFrame({'A' : ['one', 'one', 'two', 'three'] * 3,
   ....:                    'B' : ['A', 'B', 'C'] * 4,
   ....:                    'C' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 2,
   ....:                    'D' : np.random.randn(12),
   ....:                    'E' : np.random.randn(12)})
   ....: 

In [12]: df
Out[12]: 
        A  B    C         D         E
0     one  A  foo -0.424972 -0.370647
1     one  B  foo  0.567020 -1.157892
2     two  C  foo  0.276232 -1.344312
3   three  A  bar -1.087401  0.844885
..    ... ..  ...       ...       ...
8     one  C  foo  0.404705  0.357021
9     one  A  bar  0.577046 -0.674600
10    two  B  bar -1.715002 -1.776904
11  three  C  bar -1.039268 -0.968914

[12 rows x 5 columns]

We can produce pivot tables from this data very easily:

In [13]: pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C'])
Out[13]: 
C             bar       foo
A     B                    
one   A  0.577046 -0.424972
      B -0.673690  0.567020
      C  0.113648  0.404705
three A -1.087401       NaN
...           ...       ...
      C -1.039268       NaN
two   A       NaN -1.478427
      B -1.715002       NaN
      C       NaN  0.276232

[9 rows x 2 columns]