7.5 Pivot tables

The function pandas.pivot_table can be used to create spreadsheet-style pivot tables. See the cookbook for some advanced strategies

It takes a number of arguments

  • data: A DataFrame object
  • values: a column or a list of columns to aggregate
  • index: a column, Grouper, array which has the same length as data, or list of them. Keys to group by on the pivot table index. If an array is passed, it is being used as the same manner as column values.
  • columns: a column, Grouper, array which has the same length as data, or list of them. Keys to group by on the pivot table column. If an array is passed, it is being used as the same manner as column values.
  • aggfunc: function to use for aggregation, defaulting to numpy.mean

Consider a data set like this:

In [1]: import datetime

In [2]: df = pd.DataFrame({'A': ['one', 'one', 'two', 'three'] * 6,
   ...:                    'B': ['A', 'B', 'C'] * 8,
   ...:                    'C': ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 4,
   ...:                    'D': np.random.randn(24),
   ...:                    'E': np.random.randn(24),
   ...:                    'F': [datetime.datetime(2013, i, 1) for i in range(1, 13)] +
   ...:                         [datetime.datetime(2013, i, 15) for i in range(1, 13)]})
   ...: 

In [3]: df
Out[3]: 
        A  B    C         D         E          F
0     one  A  foo  0.469112  0.404705 2013-01-01
1     one  B  foo -0.282863  0.577046 2013-02-01
2     two  C  foo -1.509059 -1.715002 2013-03-01
3   three  A  bar -1.135632 -1.039268 2013-04-01
..    ... ..  ...       ...       ...        ...
20    one  C  foo -0.673690 -0.013960 2013-09-15
21    one  A  bar  0.113648 -0.362543 2013-10-15
22    two  B  bar -1.478427 -0.006154 2013-11-15
23  three  C  bar  0.524988 -0.923061 2013-12-15

[24 rows x 6 columns]

We can produce pivot tables from this data very easily:

In [4]: pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C'])
Out[4]: 
C             bar       foo
A     B                    
one   A -0.995460  0.595334
      B  0.393570 -0.494817
      C  0.196903 -0.767769
three A -0.431886       NaN
...           ...       ...
      C  0.798396       NaN
two   A       NaN  0.197720
      B -0.986678       NaN
      C       NaN -1.274317

[9 rows x 2 columns]

In [5]: pd.pivot_table(df, values='D', index=['B'], columns=['A', 'C'], aggfunc=np.sum)
Out[5]: 
A       one               three                 two          
C       bar       foo       bar       foo       bar       foo
B                                                            
A -1.990921  1.190667 -0.863772       NaN       NaN  0.395441
B  0.787140 -0.989634       NaN -2.131637 -1.973356       NaN
C  0.393806 -1.535539  1.596791       NaN       NaN -2.548633

In [6]: pd.pivot_table(df, values=['D','E'], index=['B'], columns=['A', 'C'], aggfunc=np.sum)
Out[6]: 
          D                                                           E  \
A       one               three                 two                 one   
C       bar       foo       bar       foo       bar       foo       bar   
B                                                                         
A -1.990921  1.190667 -0.863772       NaN       NaN  0.395441 -0.471593   
B  0.787140 -0.989634       NaN -2.131637 -1.973356       NaN -1.665170   
C  0.393806 -1.535539  1.596791       NaN       NaN -2.548633 -0.744154   

                                                    
A               three                two            
C       foo       bar       foo      bar       foo  
B                                                   
A  0.761726 -2.008182       NaN      NaN -1.067650  
B -0.097554       NaN  0.372851  1.63741       NaN  
C  1.061810 -2.392449       NaN      NaN -3.491906  

The result object is a DataFrame having potentially hierarchical indexes on the rows and columns. If the values column name is not given, the pivot table will include all of the data that can be aggregated in an additional level of hierarchy in the columns:

In [7]: df.head()
Out[7]: 
       A  B    C         D         E          F
0    one  A  foo  0.469112  0.404705 2013-01-01
1    one  B  foo -0.282863  0.577046 2013-02-01
2    two  C  foo -1.509059 -1.715002 2013-03-01
3  three  A  bar -1.135632 -1.039268 2013-04-01
4    one  B  bar  1.212112 -0.370647 2013-05-01

In [8]: pd.pivot_table(df, index=['A', 'B'], columns=['C'])
Out[8]: 
                D                   E          
C             bar       foo       bar       foo
A     B                                        
one   A -0.995460  0.595334 -0.235796  0.380863
      B  0.393570 -0.494817 -0.832585 -0.048777
      C  0.196903 -0.767769 -0.372077  0.530905
three A -0.431886       NaN -1.004091       NaN
...           ...       ...       ...       ...
      C  0.798396       NaN -1.196224       NaN
two   A       NaN  0.197720       NaN -0.533825
      B -0.986678       NaN  0.818705       NaN
      C       NaN -1.274317       NaN -1.745953

[9 rows x 4 columns]

Also, you can use Grouper for index and columns keywords. For detail of Grouper, see Grouping with a Grouper specification.

In [9]: df.head()
Out[9]: 
       A  B    C         D         E          F
0    one  A  foo  0.469112  0.404705 2013-01-01
1    one  B  foo -0.282863  0.577046 2013-02-01
2    two  C  foo -1.509059 -1.715002 2013-03-01
3  three  A  bar -1.135632 -1.039268 2013-04-01
4    one  B  bar  1.212112 -0.370647 2013-05-01

In [10]: pd.pivot_table(df, values='D', index=pd.Grouper(freq='M', key='F'), columns='C')
Out[10]: 
C                bar       foo
F                             
2013-01-31       NaN  0.595334
2013-02-28       NaN -0.494817
2013-03-31       NaN -1.274317
2013-04-30 -0.431886       NaN
...              ...       ...
2013-09-30       NaN -0.767769
2013-10-31 -0.995460       NaN
2013-11-30 -0.986678       NaN
2013-12-31  0.798396       NaN

[12 rows x 2 columns]

You can render a nice output of the table omitting the missing values by calling to_string if you wish:

In [11]: df.head()
Out[11]: 
       A  B    C         D         E          F
0    one  A  foo  0.469112  0.404705 2013-01-01
1    one  B  foo -0.282863  0.577046 2013-02-01
2    two  C  foo -1.509059 -1.715002 2013-03-01
3  three  A  bar -1.135632 -1.039268 2013-04-01
4    one  B  bar  1.212112 -0.370647 2013-05-01

In [12]: table = pd.pivot_table(df, index=['A', 'B'], columns=['C'])

In [13]: table
Out[13]: 
                D                   E          
C             bar       foo       bar       foo
A     B                                        
one   A -0.995460  0.595334 -0.235796  0.380863
      B  0.393570 -0.494817 -0.832585 -0.048777
      C  0.196903 -0.767769 -0.372077  0.530905
three A -0.431886       NaN -1.004091       NaN
...           ...       ...       ...       ...
      C  0.798396       NaN -1.196224       NaN
two   A       NaN  0.197720       NaN -0.533825
      B -0.986678       NaN  0.818705       NaN
      C       NaN -1.274317       NaN -1.745953

[9 rows x 4 columns]

In [14]: print(table.to_string(na_rep=''))
                D                   E          
C             bar       foo       bar       foo
A     B                                        
one   A -0.995460  0.595334 -0.235796  0.380863
      B  0.393570 -0.494817 -0.832585 -0.048777
      C  0.196903 -0.767769 -0.372077  0.530905
three A -0.431886           -1.004091          
      B           -1.065818            0.186425
      C  0.798396           -1.196224          
two   A            0.197720           -0.533825
      B -0.986678            0.818705          
      C           -1.274317           -1.745953

Note that pivot_table is also available as an instance method on DataFrame.

7.5.1 Adding margins

If you pass margins=True to pivot_table, special All columns and rows will be added with partial group aggregates across the categories on the rows and columns:

In [15]: df.head()
Out[15]: 
       A  B    C         D         E          F
0    one  A  foo  0.469112  0.404705 2013-01-01
1    one  B  foo -0.282863  0.577046 2013-02-01
2    two  C  foo -1.509059 -1.715002 2013-03-01
3  three  A  bar -1.135632 -1.039268 2013-04-01
4    one  B  bar  1.212112 -0.370647 2013-05-01

In [16]: df.pivot_table(index=['A', 'B'], columns='C', margins=True, aggfunc=np.std)
Out[16]: 
                D                             E                    
C             bar       foo       All       bar       foo       All
A     B                                                            
one   A  1.568517  0.178504  1.293926  0.179247  0.033718  0.371275
      B  1.157593  0.299748  0.860059  0.653280  0.885047  0.779837
      C  0.523425  0.133049  0.638297  1.111310  0.770555  0.938819
three A  0.995247       NaN  0.995247  0.049748       NaN  0.049748
...           ...       ...       ...       ...       ...       ...
two   A       NaN  0.111032  0.111032       NaN  1.146201  1.146201
      B  0.695438       NaN  0.695438  1.166526       NaN  1.166526
      C       NaN  0.331975  0.331975       NaN  0.043771  0.043771
All      1.014073  0.713941  0.852676  0.881376  0.984017  0.904122

[10 rows x 6 columns]