7.2 Reshaping by stacking and unstacking

Closely related to the pivot function are the related stack and unstack functions currently available on Series and DataFrame. These functions are designed to work together with MultiIndex objects (see the section on hierarchical indexing). Here are essentially what these functions do:

  • stack: “pivot” a level of the (possibly hierarchical) column labels, returning a DataFrame with an index with a new inner-most level of row labels.
  • unstack: inverse operation from stack: “pivot” a level of the (possibly hierarchical) row index to the column axis, producing a reshaped DataFrame with a new inner-most level of column labels.

The clearest way to explain is by example. Let’s take a prior example data set from the hierarchical indexing section:

In [1]: tuples = list(zip(*[['bar', 'bar', 'baz', 'baz',
   ...:                      'foo', 'foo', 'qux', 'qux'],
   ...:                     ['one', 'two', 'one', 'two',
   ...:                      'one', 'two', 'one', 'two']]))
   ...: 

In [2]: index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])

In [3]: df = pd.DataFrame(np.random.randn(8, 2), index=index, columns=['A', 'B'])

In [4]: df
Out[4]: 
                     A         B
first second                    
bar   one     0.469112 -0.282863
      two    -1.509059 -1.135632
baz   one     1.212112 -0.173215
      two     0.119209 -1.044236
foo   one    -0.861849 -2.104569
      two    -0.494929  1.071804
qux   one     0.721555 -0.706771
      two    -1.039575  0.271860

In [5]: df2 = df[:4]

In [6]: df2
Out[6]: 
                     A         B
first second                    
bar   one     0.469112 -0.282863
      two    -1.509059 -1.135632
baz   one     1.212112 -0.173215
      two     0.119209 -1.044236

The stack function “compresses” a level in the DataFrame’s columns to produce either:

  • A Series, in the case of a simple column Index
  • A DataFrame, in the case of a MultiIndex in the columns

If the columns have a MultiIndex, you can choose which level to stack. The stacked level becomes the new lowest level in a MultiIndex on the columns:

In [7]: stacked = df2.stack()

In [8]: stacked
Out[8]: 
first  second   
bar    one     A    0.469112
               B   -0.282863
       two     A   -1.509059
               B   -1.135632
baz    one     A    1.212112
               B   -0.173215
       two     A    0.119209
               B   -1.044236
dtype: float64

With a “stacked” DataFrame or Series (having a MultiIndex as the index), the inverse operation of stack is unstack, which by default unstacks the last level:

In [9]: stacked.unstack()  # last level (default)
Out[9]: 
                     A         B
first second                    
bar   one     0.469112 -0.282863
      two    -1.509059 -1.135632
baz   one     1.212112 -0.173215
      two     0.119209 -1.044236

In [10]: stacked.unstack(1) # 2nd level
Out[10]: 
second        one       two
first                      
bar   A  0.469112 -1.509059
      B -0.282863 -1.135632
baz   A  1.212112  0.119209
      B -0.173215 -1.044236

In [11]: stacked.unstack(0) # 1st level
Out[11]: 
first          bar       baz
second                      
one    A  0.469112  1.212112
       B -0.282863 -0.173215
two    A -1.509059  0.119209
       B -1.135632 -1.044236

If the indexes have names, you can use the level names instead of specifying the level numbers:

In [12]: stacked
Out[12]: 
first  second   
bar    one     A    0.469112
               B   -0.282863
       two     A   -1.509059
               B   -1.135632
baz    one     A    1.212112
               B   -0.173215
       two     A    0.119209
               B   -1.044236
dtype: float64

In [13]: stacked.unstack('second')
Out[13]: 
second        one       two
first                      
bar   A  0.469112 -1.509059
      B -0.282863 -1.135632
baz   A  1.212112  0.119209
      B -0.173215 -1.044236

Notice that the stack and unstack methods implicitly sort the index levels involved. Hence a call to stack and then unstack, or viceversa, will result in a sorted copy of the original DataFrame or Series:

In [14]: index = pd.MultiIndex.from_product([[2,1], ['a', 'b']])

In [15]: df = pd.DataFrame(np.random.randn(4), index=index, columns=['A'])

In [16]: df
Out[16]: 
            A
2 a -0.424972
  b  0.567020
1 a  0.276232
  b -1.087401

In [17]: all(df.unstack().stack() == df.sort_index())
Out[17]: True

while the above code will raise a TypeError if the call to sort_index is removed.

7.2.1 Multiple Levels

You may also stack or unstack more than one level at a time by passing a list of levels, in which case the end result is as if each level in the list were processed individually.

In [18]: columns = pd.MultiIndex.from_tuples([
   ....:         ('A', 'cat', 'long'), ('B', 'cat', 'long'),
   ....:         ('A', 'dog', 'short'), ('B', 'dog', 'short')
   ....:     ],
   ....:     names=['exp', 'animal', 'hair_length']
   ....: )
   ....: 

In [19]: df = pd.DataFrame(np.random.randn(4, 4), columns=columns)

In [20]: df
Out[20]: 
exp                 A         B         A         B
animal            cat       cat       dog       dog
hair_length      long      long     short     short
0           -0.673690  0.113648 -1.478427  0.524988
1            0.404705  0.577046 -1.715002 -1.039268
2           -0.370647 -1.157892 -1.344312  0.844885
3            1.075770 -0.109050  1.643563 -1.469388

In [21]: df.stack(level=['animal', 'hair_length'])
Out[21]: 
exp                          A         B
  animal hair_length                    
0 cat    long        -0.673690  0.113648
  dog    short       -1.478427  0.524988
1 cat    long         0.404705  0.577046
  dog    short       -1.715002 -1.039268
2 cat    long        -0.370647 -1.157892
  dog    short       -1.344312  0.844885
3 cat    long         1.075770 -0.109050
  dog    short        1.643563 -1.469388

The list of levels can contain either level names or level numbers (but not a mixture of the two).

# df.stack(level=['animal', 'hair_length'])
# from above is equivalent to:
In [22]: df.stack(level=[1, 2])
Out[22]: 
exp                          A         B
  animal hair_length                    
0 cat    long        -0.673690  0.113648
  dog    short       -1.478427  0.524988
1 cat    long         0.404705  0.577046
  dog    short       -1.715002 -1.039268
2 cat    long        -0.370647 -1.157892
  dog    short       -1.344312  0.844885
3 cat    long         1.075770 -0.109050
  dog    short        1.643563 -1.469388

7.2.2 Missing Data

These functions are intelligent about handling missing data and do not expect each subgroup within the hierarchical index to have the same set of labels. They also can handle the index being unsorted (but you can make it sorted by calling sort_index, of course). Here is a more complex example:

In [23]: columns = pd.MultiIndex.from_tuples([('A', 'cat'), ('B', 'dog'),
   ....:                                      ('B', 'cat'), ('A', 'dog')],
   ....:                                     names=['exp', 'animal'])
   ....: 

In [24]: index = pd.MultiIndex.from_product([('bar', 'baz', 'foo', 'qux'),
   ....:                                     ('one', 'two')],
   ....:                                    names=['first', 'second'])
   ....: 

In [25]: df = pd.DataFrame(np.random.randn(8, 4), index=index, columns=columns)

In [26]: df2 = df.ix[[0, 1, 2, 4, 5, 7]]

In [27]: df2
Out[27]: 
exp                  A         B                   A
animal             cat       dog       cat       dog
first second                                        
bar   one     0.357021 -0.674600 -1.776904 -0.968914
      two    -1.294524  0.413738  0.276662 -0.472035
baz   one    -0.013960 -0.362543 -0.006154 -0.923061
foo   one     1.431256  1.340309 -1.170299 -0.226169
      two     0.410835  0.813850  0.132003 -0.827317
qux   two    -1.413681  1.607920  1.024180  0.569605

As mentioned above, stack can be called with a level argument to select which level in the columns to stack:

In [28]: df2.stack('exp')
Out[28]: 
animal                 cat       dog
first second exp                    
bar   one    A    0.357021 -0.968914
             B   -1.776904 -0.674600
      two    A   -1.294524 -0.472035
             B    0.276662  0.413738
...                    ...       ...
foo   two    A    0.410835 -0.827317
             B    0.132003  0.813850
qux   two    A   -1.413681  0.569605
             B    1.024180  1.607920

[12 rows x 2 columns]

In [29]: df2.stack('animal')
Out[29]: 
exp                         A         B
first second animal                    
bar   one    cat     0.357021 -1.776904
             dog    -0.968914 -0.674600
      two    cat    -1.294524  0.276662
             dog    -0.472035  0.413738
...                       ...       ...
foo   two    cat     0.410835  0.132003
             dog    -0.827317  0.813850
qux   two    cat    -1.413681  1.024180
             dog     0.569605  1.607920

[12 rows x 2 columns]

Unstacking can result in missing values if subgroups do not have the same set of labels. By default, missing values will be replaced with the default fill value for that data type, NaN for float, NaT for datetimelike, etc. For integer types, by default data will converted to float and missing values will be set to NaN.

In [30]: df3 = df.iloc[[0, 1, 4, 7], [1, 2]]

In [31]: df3
Out[31]: 
exp                  B          
animal             dog       cat
first second                    
bar   one    -0.674600 -1.776904
      two     0.413738  0.276662
foo   one     1.340309 -1.170299
qux   two     1.607920  1.024180

In [32]: df3.unstack()
Out[32]: 
exp            B                              
animal       dog                 cat          
second       one       two       one       two
first                                         
bar    -0.674600  0.413738 -1.776904  0.276662
foo     1.340309       NaN -1.170299       NaN
qux          NaN  1.607920       NaN  1.024180

Alternatively, unstack takes an optional fill_value argument, for specifying the value of missing data.

In [33]: df3.unstack(fill_value=-1e9)
Out[33]: 
exp                B                                          
animal           dog                         cat              
second           one           two           one           two
first                                                         
bar    -6.746001e-01  4.137381e-01 -1.776904e+00  2.766617e-01
foo     1.340309e+00 -1.000000e+09 -1.170299e+00 -1.000000e+09
qux    -1.000000e+09  1.607920e+00 -1.000000e+09  1.024180e+00

7.2.3 With a MultiIndex

Unstacking when the columns are a MultiIndex is also careful about doing the right thing:

In [34]: df
Out[34]: 
exp                  A         B                   A
animal             cat       dog       cat       dog
first second                                        
bar   one     0.357021 -0.674600 -1.776904 -0.968914
      two    -1.294524  0.413738  0.276662 -0.472035
baz   one    -0.013960 -0.362543 -0.006154 -0.923061
      two     0.895717  0.805244 -1.206412  2.565646
foo   one     1.431256  1.340309 -1.170299 -0.226169
      two     0.410835  0.813850  0.132003 -0.827317
qux   one    -0.076467 -1.187678  1.130127 -1.436737
      two    -1.413681  1.607920  1.024180  0.569605

In [35]: df[:3].unstack(0)
Out[35]: 
exp            A                  B                                       A  \
animal       cat                dog                 cat                 dog   
first        bar      baz       bar       baz       bar       baz       bar   
second                                                                        
one     0.357021 -0.01396 -0.674600 -0.362543 -1.776904 -0.006154 -0.968914   
two    -1.294524      NaN  0.413738       NaN  0.276662       NaN -0.472035   

exp               
animal            
first        baz  
second            
one    -0.923061  
two          NaN  
In [36]: df2
Out[36]: 
exp                  A         B                   A
animal             cat       dog       cat       dog
first second                                        
bar   one     0.357021 -0.674600 -1.776904 -0.968914
      two    -1.294524  0.413738  0.276662 -0.472035
baz   one    -0.013960 -0.362543 -0.006154 -0.923061
foo   one     1.431256  1.340309 -1.170299 -0.226169
      two     0.410835  0.813850  0.132003 -0.827317
qux   two    -1.413681  1.607920  1.024180  0.569605

In [37]: df2.unstack(1)
Out[37]: 
exp            A                   B                                       A  \
animal       cat                 dog                 cat                 dog   
second       one       two       one       two       one       two       one   
first                                                                          
bar     0.357021 -1.294524 -0.674600  0.413738 -1.776904  0.276662 -0.968914   
baz    -0.013960       NaN -0.362543       NaN -0.006154       NaN -0.923061   
foo     1.431256  0.410835  1.340309  0.813850 -1.170299  0.132003 -0.226169   
qux          NaN -1.413681       NaN  1.607920       NaN  1.024180       NaN   

exp               
animal            
second       two  
first             
bar    -0.472035  
baz          NaN  
foo    -0.827317  
qux     0.569605