5.8 Dispatching to instance methods

When doing an aggregation or transformation, you might just want to call an instance method on each data group. This is pretty easy to do by passing lambda functions:

In [1]: df
Out[1]: 
     A      B       C       D
0  foo    one  0.4691 -0.8618
1  bar    one -0.2829 -2.1046
2  foo    two -1.5091 -0.4949
3  bar  three -1.1356  1.0718
4  foo    two  1.2121  0.7216
5  bar    two -0.1732 -0.7068
6  foo    one  0.1192 -1.0396
7  foo  three -1.0442  0.2719

In [2]: grouped = df.groupby('A')

In [3]: grouped.agg(lambda x: x.std())
Out[3]: 
          C       D
A                  
bar  0.5269  1.5920
foo  1.1133  0.7532

But, it’s rather verbose and can be untidy if you need to pass additional arguments. Using a bit of metaprogramming cleverness, GroupBy now has the ability to “dispatch” method calls to the groups:

In [4]: grouped.std()
Out[4]: 
          C       D
A                  
bar  0.5269  1.5920
foo  1.1133  0.7532

What is actually happening here is that a function wrapper is being generated. When invoked, it takes any passed arguments and invokes the function with any arguments on each group (in the above example, the std function). The results are then combined together much in the style of agg and transform (it actually uses apply to infer the gluing, documented next). This enables some operations to be carried out rather succinctly:

In [5]: tsdf = pd.DataFrame(np.random.randn(1000, 3),
   ...:                     index=pd.date_range('1/1/2000', periods=1000),
   ...:                     columns=['A', 'B', 'C'])
   ...: 

In [6]: tsdf.ix[::2] = np.nan

In [7]: grouped = tsdf.groupby(lambda x: x.year)

In [8]: grouped.fillna(method='pad')
Out[8]: 
                 A       B       C
2000-01-01     NaN     NaN     NaN
2000-01-02 -1.0874 -0.6737  0.1136
2000-01-03 -1.0874 -0.6737  0.1136
2000-01-04  0.5770 -1.7150 -1.0393
2000-01-05  0.5770 -1.7150 -1.0393
2000-01-06  0.8449  1.0758 -0.1090
2000-01-07  0.8449  1.0758 -0.1090
...            ...     ...     ...
2002-09-20 -1.0644 -0.4991 -1.9149
2002-09-21 -1.0644 -0.4991 -1.9149
2002-09-22 -0.3688  0.3842 -1.4366
2002-09-23 -0.3688  0.3842 -1.4366
2002-09-24 -3.1013 -0.5137 -1.9194
2002-09-25 -3.1013 -0.5137 -1.9194
2002-09-26  1.8759  0.4622  0.6205

[1000 rows x 3 columns]

In this example, we chopped the collection of time series into yearly chunks then independently called fillna on the groups.

New in version 0.14.1.

The nlargest and nsmallest methods work on Series style groupbys:

In [9]: s = pd.Series([9, 8, 7, 5, 19, 1, 4.2, 3.3])

In [10]: g = pd.Series(list('abababab'))

In [11]: gb = s.groupby(g)

In [12]: gb.nlargest(3)
Out[12]: 
a  4    19.0
   0     9.0
   2     7.0
b  1     8.0
   3     5.0
   7     3.3
dtype: float64

In [13]: gb.nsmallest(3)
Out[13]: 
a  6    4.2
   2    7.0
   0    9.0
b  5    1.0
   7    3.3
   3    5.0
dtype: float64