7.5 Pivot tables
The function pandas.pivot_table
can be used to create spreadsheet-style pivot
tables. See the cookbook for some advanced strategies
It takes a number of arguments
data
: A DataFrame objectvalues
: a column or a list of columns to aggregateindex
: a column, Grouper, array which has the same length as data, or list of them. Keys to group by on the pivot table index. If an array is passed, it is being used as the same manner as column values.columns
: a column, Grouper, array which has the same length as data, or list of them. Keys to group by on the pivot table column. If an array is passed, it is being used as the same manner as column values.aggfunc
: function to use for aggregation, defaulting tonumpy.mean
Consider a data set like this:
In [1]: import datetime
In [2]: df = pd.DataFrame({'A': ['one', 'one', 'two', 'three'] * 6,
...: 'B': ['A', 'B', 'C'] * 8,
...: 'C': ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 4,
...: 'D': np.random.randn(24),
...: 'E': np.random.randn(24),
...: 'F': [datetime.datetime(2013, i, 1) for i in range(1, 13)] +
...: [datetime.datetime(2013, i, 15) for i in range(1, 13)]})
...:
In [3]: df
Out[3]:
A B C D E F
0 one A foo 0.469112 0.404705 2013-01-01
1 one B foo -0.282863 0.577046 2013-02-01
2 two C foo -1.509059 -1.715002 2013-03-01
3 three A bar -1.135632 -1.039268 2013-04-01
.. ... .. ... ... ... ...
20 one C foo -0.673690 -0.013960 2013-09-15
21 one A bar 0.113648 -0.362543 2013-10-15
22 two B bar -1.478427 -0.006154 2013-11-15
23 three C bar 0.524988 -0.923061 2013-12-15
[24 rows x 6 columns]
We can produce pivot tables from this data very easily:
In [4]: pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C'])
Out[4]:
C bar foo
A B
one A -0.995460 0.595334
B 0.393570 -0.494817
C 0.196903 -0.767769
three A -0.431886 NaN
... ... ...
C 0.798396 NaN
two A NaN 0.197720
B -0.986678 NaN
C NaN -1.274317
[9 rows x 2 columns]
In [5]: pd.pivot_table(df, values='D', index=['B'], columns=['A', 'C'], aggfunc=np.sum)
Out[5]:
A one three two
C bar foo bar foo bar foo
B
A -1.990921 1.190667 -0.863772 NaN NaN 0.395441
B 0.787140 -0.989634 NaN -2.131637 -1.973356 NaN
C 0.393806 -1.535539 1.596791 NaN NaN -2.548633
In [6]: pd.pivot_table(df, values=['D','E'], index=['B'], columns=['A', 'C'], aggfunc=np.sum)
Out[6]:
D E \
A one three two one
C bar foo bar foo bar foo bar
B
A -1.990921 1.190667 -0.863772 NaN NaN 0.395441 -0.471593
B 0.787140 -0.989634 NaN -2.131637 -1.973356 NaN -1.665170
C 0.393806 -1.535539 1.596791 NaN NaN -2.548633 -0.744154
A three two
C foo bar foo bar foo
B
A 0.761726 -2.008182 NaN NaN -1.067650
B -0.097554 NaN 0.372851 1.63741 NaN
C 1.061810 -2.392449 NaN NaN -3.491906
The result object is a DataFrame having potentially hierarchical indexes on the
rows and columns. If the values
column name is not given, the pivot table
will include all of the data that can be aggregated in an additional level of
hierarchy in the columns:
In [7]: df.head()
Out[7]:
A B C D E F
0 one A foo 0.469112 0.404705 2013-01-01
1 one B foo -0.282863 0.577046 2013-02-01
2 two C foo -1.509059 -1.715002 2013-03-01
3 three A bar -1.135632 -1.039268 2013-04-01
4 one B bar 1.212112 -0.370647 2013-05-01
In [8]: pd.pivot_table(df, index=['A', 'B'], columns=['C'])
Out[8]:
D E
C bar foo bar foo
A B
one A -0.995460 0.595334 -0.235796 0.380863
B 0.393570 -0.494817 -0.832585 -0.048777
C 0.196903 -0.767769 -0.372077 0.530905
three A -0.431886 NaN -1.004091 NaN
... ... ... ... ...
C 0.798396 NaN -1.196224 NaN
two A NaN 0.197720 NaN -0.533825
B -0.986678 NaN 0.818705 NaN
C NaN -1.274317 NaN -1.745953
[9 rows x 4 columns]
Also, you can use Grouper
for index
and columns
keywords. For detail of Grouper
, see Grouping with a Grouper specification.
In [9]: df.head()
Out[9]:
A B C D E F
0 one A foo 0.469112 0.404705 2013-01-01
1 one B foo -0.282863 0.577046 2013-02-01
2 two C foo -1.509059 -1.715002 2013-03-01
3 three A bar -1.135632 -1.039268 2013-04-01
4 one B bar 1.212112 -0.370647 2013-05-01
In [10]: pd.pivot_table(df, values='D', index=pd.Grouper(freq='M', key='F'), columns='C')
Out[10]:
C bar foo
F
2013-01-31 NaN 0.595334
2013-02-28 NaN -0.494817
2013-03-31 NaN -1.274317
2013-04-30 -0.431886 NaN
... ... ...
2013-09-30 NaN -0.767769
2013-10-31 -0.995460 NaN
2013-11-30 -0.986678 NaN
2013-12-31 0.798396 NaN
[12 rows x 2 columns]
You can render a nice output of the table omitting the missing values by
calling to_string
if you wish:
In [11]: df.head()
Out[11]:
A B C D E F
0 one A foo 0.469112 0.404705 2013-01-01
1 one B foo -0.282863 0.577046 2013-02-01
2 two C foo -1.509059 -1.715002 2013-03-01
3 three A bar -1.135632 -1.039268 2013-04-01
4 one B bar 1.212112 -0.370647 2013-05-01
In [12]: table = pd.pivot_table(df, index=['A', 'B'], columns=['C'])
In [13]: table
Out[13]:
D E
C bar foo bar foo
A B
one A -0.995460 0.595334 -0.235796 0.380863
B 0.393570 -0.494817 -0.832585 -0.048777
C 0.196903 -0.767769 -0.372077 0.530905
three A -0.431886 NaN -1.004091 NaN
... ... ... ... ...
C 0.798396 NaN -1.196224 NaN
two A NaN 0.197720 NaN -0.533825
B -0.986678 NaN 0.818705 NaN
C NaN -1.274317 NaN -1.745953
[9 rows x 4 columns]
In [14]: print(table.to_string(na_rep=''))
D E
C bar foo bar foo
A B
one A -0.995460 0.595334 -0.235796 0.380863
B 0.393570 -0.494817 -0.832585 -0.048777
C 0.196903 -0.767769 -0.372077 0.530905
three A -0.431886 -1.004091
B -1.065818 0.186425
C 0.798396 -1.196224
two A 0.197720 -0.533825
B -0.986678 0.818705
C -1.274317 -1.745953
Note that pivot_table
is also available as an instance method on DataFrame.
7.5.1 Adding margins
If you pass margins=True
to pivot_table
, special All
columns and
rows will be added with partial group aggregates across the categories on the
rows and columns:
In [15]: df.head()
Out[15]:
A B C D E F
0 one A foo 0.469112 0.404705 2013-01-01
1 one B foo -0.282863 0.577046 2013-02-01
2 two C foo -1.509059 -1.715002 2013-03-01
3 three A bar -1.135632 -1.039268 2013-04-01
4 one B bar 1.212112 -0.370647 2013-05-01
In [16]: df.pivot_table(index=['A', 'B'], columns='C', margins=True, aggfunc=np.std)
Out[16]:
D E
C bar foo All bar foo All
A B
one A 1.568517 0.178504 1.293926 0.179247 0.033718 0.371275
B 1.157593 0.299748 0.860059 0.653280 0.885047 0.779837
C 0.523425 0.133049 0.638297 1.111310 0.770555 0.938819
three A 0.995247 NaN 0.995247 0.049748 NaN 0.049748
... ... ... ... ... ... ...
two A NaN 0.111032 0.111032 NaN 1.146201 1.146201
B 0.695438 NaN 0.695438 1.166526 NaN 1.166526
C NaN 0.331975 0.331975 NaN 0.043771 0.043771
All 1.014073 0.713941 0.852676 0.881376 0.984017 0.904122
[10 rows x 6 columns]