3.3 Aggregation

Once the Rolling, Expanding or EWM objects have been created, several methods are available to perform multiple computations on the data. This is very similar to a .groupby(...).agg seen here.

In [1]: dfa = pd.DataFrame(np.random.randn(1000, 3),
   ...:                    index=pd.date_range('1/1/2000', periods=1000),
   ...:                    columns=['A', 'B', 'C'])
   ...: 

In [2]: r = dfa.rolling(window=60,min_periods=1)

In [3]: r
Out[3]: Rolling [window=60,min_periods=1,center=False,axis=0]

We can aggregate by passing a function to the entire DataFrame, or select a Series (or multiple Series) via standard getitem.

In [4]: r.aggregate(np.sum)
Out[4]: 
                   A         B         C
2000-01-01  0.469112 -0.282863 -1.509059
2000-01-02 -0.666520  0.929249 -1.682273
2000-01-03 -0.547311 -0.114987 -2.544122
2000-01-04 -2.651881 -0.609917 -1.472318
...              ...       ...       ...
2002-09-23  8.342906 -8.420254 -4.488404
2002-09-24  8.922223 -8.817915 -1.041163
2002-09-25  6.181801 -9.888219 -4.544517
2002-09-26  4.105633 -7.922664 -4.677549

[1000 rows x 3 columns]

In [5]: r['A'].aggregate(np.sum)
Out[5]: 
2000-01-01    0.469112
2000-01-02   -0.666520
2000-01-03   -0.547311
2000-01-04   -2.651881
                ...   
2002-09-23    8.342906
2002-09-24    8.922223
2002-09-25    6.181801
2002-09-26    4.105633
Freq: D, Name: A, dtype: float64

In [6]: r[['A','B']].aggregate(np.sum)
Out[6]: 
                   A         B
2000-01-01  0.469112 -0.282863
2000-01-02 -0.666520  0.929249
2000-01-03 -0.547311 -0.114987
2000-01-04 -2.651881 -0.609917
...              ...       ...
2002-09-23  8.342906 -8.420254
2002-09-24  8.922223 -8.817915
2002-09-25  6.181801 -9.888219
2002-09-26  4.105633 -7.922664

[1000 rows x 2 columns]

As you can see, the result of the aggregation will have the selected columns, or all columns if none are selected.

3.3.1 Applying multiple functions at once

With windowed Series you can also pass a list or dict of functions to do aggregation with, outputting a DataFrame:

In [7]: r['A'].agg([np.sum, np.mean, np.std])
Out[7]: 
                 sum      mean       std
2000-01-01  0.469112  0.469112       NaN
2000-01-02 -0.666520 -0.333260  1.134726
2000-01-03 -0.547311 -0.182437  0.843827
2000-01-04 -2.651881 -0.662970  1.182516
...              ...       ...       ...
2002-09-23  8.342906  0.139048  1.006288
2002-09-24  8.922223  0.148704  0.996540
2002-09-25  6.181801  0.103030  1.031933
2002-09-26  4.105633  0.068427  1.064250

[1000 rows x 3 columns]

If a dict is passed, the keys will be used to name the columns. Otherwise the function’s name (stored in the function object) will be used.

In [8]: r['A'].agg({'result1' : np.sum,
   ...:             'result2' : np.mean})
   ...: 
Out[8]: 
             result2   result1
2000-01-01  0.469112  0.469112
2000-01-02 -0.333260 -0.666520
2000-01-03 -0.182437 -0.547311
2000-01-04 -0.662970 -2.651881
...              ...       ...
2002-09-23  0.139048  8.342906
2002-09-24  0.148704  8.922223
2002-09-25  0.103030  6.181801
2002-09-26  0.068427  4.105633

[1000 rows x 2 columns]

On a widowed DataFrame, you can pass a list of functions to apply to each column, which produces an aggregated result with a hierarchical index:

In [9]: r.agg([np.sum, np.mean])
Out[9]: 
                   A                   B                   C          
                 sum      mean       sum      mean       sum      mean
2000-01-01  0.469112  0.469112 -0.282863 -0.282863 -1.509059 -1.509059
2000-01-02 -0.666520 -0.333260  0.929249  0.464624 -1.682273 -0.841137
2000-01-03 -0.547311 -0.182437 -0.114987 -0.038329 -2.544122 -0.848041
2000-01-04 -2.651881 -0.662970 -0.609917 -0.152479 -1.472318 -0.368080
...              ...       ...       ...       ...       ...       ...
2002-09-23  8.342906  0.139048 -8.420254 -0.140338 -4.488404 -0.074807
2002-09-24  8.922223  0.148704 -8.817915 -0.146965 -1.041163 -0.017353
2002-09-25  6.181801  0.103030 -9.888219 -0.164804 -4.544517 -0.075742
2002-09-26  4.105633  0.068427 -7.922664 -0.132044 -4.677549 -0.077959

[1000 rows x 6 columns]

Passing a dict of functions has different behavior by default, see the next section.

3.3.2 Applying different functions to DataFrame columns

By passing a dict to aggregate you can apply a different aggregation to the columns of a DataFrame:

In [10]: r.agg({'A' : np.sum,
   ....:        'B' : lambda x: np.std(x, ddof=1)})
   ....: 
Out[10]: 
                   A         B
2000-01-01  0.469112       NaN
2000-01-02 -0.666520  1.057107
2000-01-03 -0.547311  1.147878
2000-01-04 -2.651881  0.964644
...              ...       ...
2002-09-23  8.342906  1.035191
2002-09-24  8.922223  1.032920
2002-09-25  6.181801  1.039457
2002-09-26  4.105633  1.058789

[1000 rows x 2 columns]

The function names can also be strings. In order for a string to be valid it must be implemented on the windowed object

In [11]: r.agg({'A' : 'sum', 'B' : 'std'})
Out[11]: 
                   A         B
2000-01-01  0.469112       NaN
2000-01-02 -0.666520  1.057107
2000-01-03 -0.547311  1.147878
2000-01-04 -2.651881  0.964644
...              ...       ...
2002-09-23  8.342906  1.035191
2002-09-24  8.922223  1.032920
2002-09-25  6.181801  1.039457
2002-09-26  4.105633  1.058789

[1000 rows x 2 columns]

Furthermore you can pass a nested dict to indicate different aggregations on different columns.

In [12]: r.agg({'A' : ['sum','std'], 'B' : ['mean','std'] })
Out[12]: 
                   A                   B          
                 sum       std      mean       std
2000-01-01  0.469112       NaN -0.282863       NaN
2000-01-02 -0.666520  1.134726  0.464624  1.057107
2000-01-03 -0.547311  0.843827 -0.038329  1.147878
2000-01-04 -2.651881  1.182516 -0.152479  0.964644
...              ...       ...       ...       ...
2002-09-23  8.342906  1.006288 -0.140338  1.035191
2002-09-24  8.922223  0.996540 -0.146965  1.032920
2002-09-25  6.181801  1.031933 -0.164804  1.039457
2002-09-26  4.105633  1.064250 -0.132044  1.058789

[1000 rows x 4 columns]