5.9 Flexible apply

Some operations on the grouped data might not fit into either the aggregate or transform categories. Or, you may simply want GroupBy to infer how to combine the results. For these, use the apply function, which can be substituted for both aggregate and transform in many standard use cases. However, apply can handle some exceptional use cases, for example:

In [1]: df
Out[1]: 
     A      B       C       D
0  foo    one  0.4691 -0.8618
1  bar    one -0.2829 -2.1046
2  foo    two -1.5091 -0.4949
3  bar  three -1.1356  1.0718
4  foo    two  1.2121  0.7216
5  bar    two -0.1732 -0.7068
6  foo    one  0.1192 -1.0396
7  foo  three -1.0442  0.2719

In [2]: grouped = df.groupby('A')

# could also just call .describe()
In [3]: grouped['C'].apply(lambda x: x.describe())
Out[3]: 
A         
bar  count    3.0000
     mean    -0.5306
     std      0.5269
     min     -1.1356
     25%     -0.7092
     50%     -0.2829
     75%     -0.2280
               ...  
foo  mean    -0.1506
     std      1.1133
     min     -1.5091
     25%     -1.0442
     50%      0.1192
     75%      0.4691
     max      1.2121
Name: C, dtype: float64

The dimension of the returned result can also change:

In [4]: grouped = df.groupby('A')['C']

In [5]: def f(group):
   ...:     return pd.DataFrame({'original' : group,
   ...:                          'demeaned' : group - group.mean()})
   ...: 

In [6]: grouped.apply(f)
Out[6]: 
   demeaned  original
0    0.6197    0.4691
1    0.2477   -0.2829
2   -1.3585   -1.5091
3   -0.6051   -1.1356
4    1.3627    1.2121
5    0.3574   -0.1732
6    0.2698    0.1192
7   -0.8937   -1.0442

apply on a Series can operate on a returned value from the applied function, that is itself a series, and possibly upcast the result to a DataFrame

In [7]: def f(x):
   ...:   return pd.Series([ x, x**2 ], index = ['x', 'x^2'])
   ...: 

In [8]: s
Out[8]: 
0     9.0
1     8.0
2     7.0
3     5.0
4    19.0
5     1.0
6     4.2
7     3.3
dtype: float64

In [9]: s.apply(f)
Out[9]: 
      x     x^2
0   9.0   81.00
1   8.0   64.00
2   7.0   49.00
3   5.0   25.00
4  19.0  361.00
5   1.0    1.00
6   4.2   17.64
7   3.3   10.89

Note

apply can act as a reducer, transformer, or filter function, depending on exactly what is passed to it. So depending on the path taken, and exactly what you are grouping. Thus the grouped columns(s) may be included in the output as well as set the indices.

Warning

In the current implementation apply calls func twice on the first group to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects, as they will take effect twice for the first group.

In [10]: d = pd.DataFrame({"a":["x", "y"], "b":[1,2]})

In [11]: def identity(df):
   ....:     print df
   ....:     return df
   ....: 

In [12]: d.groupby("a").apply(identity)
   a  b
0  x  1
   a  b
0  x  1
   a  b
1  y  2
Out[12]: 
   a  b
0  x  1
1  y  2