7.1 Reshaping by pivoting DataFrame objects

import pandas.util.testing as tm; tm.N = 3
def unpivot(frame):
    N, K = frame.shape
    data = {'value' : frame.values.ravel('F'),
            'variable' : np.asarray(frame.columns).repeat(N),
            'date' : np.tile(np.asarray(frame.index), K)}
    return pd.DataFrame(data, columns=['date', 'variable', 'value'])
df = unpivot(tm.makeTimeDataFrame())

Data is often stored in CSV files or databases in so-called “stacked” or “record” format:

In [1]: df
Out[1]: 
         date variable     value
0  2000-01-03        A  0.469112
1  2000-01-04        A -0.282863
2  2000-01-05        A -1.509059
3  2000-01-03        B -1.135632
..        ...      ...       ...
8  2000-01-05        C -0.861849
9  2000-01-03        D -2.104569
10 2000-01-04        D -0.494929
11 2000-01-05        D  1.071804

[12 rows x 3 columns]

To select out everything for variable A we could do:

In [2]: df[df['variable'] == 'A']
Out[2]: 
        date variable     value
0 2000-01-03        A  0.469112
1 2000-01-04        A -0.282863
2 2000-01-05        A -1.509059

But suppose we wish to do time series operations with the variables. A better representation would be where the columns are the unique variables and an index of dates identifies individual observations. To reshape the data into this form, use the pivot function:

In [3]: df
Out[3]: 
         date variable     value
0  2000-01-03        A  0.469112
1  2000-01-04        A -0.282863
2  2000-01-05        A -1.509059
3  2000-01-03        B -1.135632
..        ...      ...       ...
8  2000-01-05        C -0.861849
9  2000-01-03        D -2.104569
10 2000-01-04        D -0.494929
11 2000-01-05        D  1.071804

[12 rows x 3 columns]

In [4]: df.pivot(index='date', columns='variable', values='value')
Out[4]: 
variable           A         B         C         D
date                                              
2000-01-03  0.469112 -1.135632  0.119209 -2.104569
2000-01-04 -0.282863  1.212112 -1.044236 -0.494929
2000-01-05 -1.509059 -0.173215 -0.861849  1.071804

If the values argument is omitted, and the input DataFrame has more than one column of values which are not used as column or index inputs to pivot, then the resulting “pivoted” DataFrame will have hierarchical columns whose topmost level indicates the respective value column:

In [5]: df['value2'] = df['value'] * 2

In [6]: df
Out[6]: 
         date variable     value    value2
0  2000-01-03        A  0.469112  0.938225
1  2000-01-04        A -0.282863 -0.565727
2  2000-01-05        A -1.509059 -3.018117
3  2000-01-03        B -1.135632 -2.271265
..        ...      ...       ...       ...
8  2000-01-05        C -0.861849 -1.723698
9  2000-01-03        D -2.104569 -4.209138
10 2000-01-04        D -0.494929 -0.989859
11 2000-01-05        D  1.071804  2.143608

[12 rows x 4 columns]

In [7]: pivoted = df.pivot('date', 'variable')

In [8]: pivoted
Out[8]: 
               value                                  value2            \
variable           A         B         C         D         A         B   
date                                                                     
2000-01-03  0.469112 -1.135632  0.119209 -2.104569  0.938225 -2.271265   
2000-01-04 -0.282863  1.212112 -1.044236 -0.494929 -0.565727  2.424224   
2000-01-05 -1.509059 -0.173215 -0.861849  1.071804 -3.018117 -0.346429   

                                
variable           C         D  
date                            
2000-01-03  0.238417 -4.209138  
2000-01-04 -2.088472 -0.989859  
2000-01-05 -1.723698  2.143608  

You of course can then select subsets from the pivoted DataFrame:

In [9]: pivoted['value2']
Out[9]: 
variable           A         B         C         D
date                                              
2000-01-03  0.938225 -2.271265  0.238417 -4.209138
2000-01-04 -0.565727  2.424224 -2.088472 -0.989859
2000-01-05 -3.018117 -0.346429 -1.723698  2.143608

Note that this returns a view on the underlying data in the case where the data are homogeneously-typed.