4.3 plyr

plyr is an R library for the split-apply-combine strategy for data analysis. The functions revolve around three data structures in R, a for arrays, l for lists, and d for data.frame. The table below shows how these data structures could be mapped in Python.

R Python
array list
lists dictionary or list of objects
data.frame dataframe

4.3.1 |ddply|_

An expression using a data.frame called df in R where you want to summarize x by month:

require(plyr)
df <- data.frame(
  x = runif(120, 1, 168),
  y = runif(120, 7, 334),
  z = runif(120, 1.7, 20.7),
  month = rep(c(5,6,7,8),30),
  week = sample(1:4, 120, TRUE)
)

ddply(df, .(month, week), summarize,
      mean = round(mean(x), 2),
      sd = round(sd(x), 2))

In pandas the equivalent expression, using the groupby() method, would be:

In [1]: df = pd.DataFrame({
   ...:     'x': np.random.uniform(1., 168., 120),
   ...:     'y': np.random.uniform(7., 334., 120),
   ...:     'z': np.random.uniform(1.7, 20.7, 120),
   ...:     'month': [5,6,7,8]*30,
   ...:     'week': np.random.randint(1,4, 120)
   ...: })
   ...: 

In [2]: grouped = df.groupby(['month','week'])

In [3]: grouped['x'].agg([np.mean, np.std])
Out[3]: 
                 mean        std
month week                      
5     1     63.653367  40.601965
      2     78.126605  53.342400
      3     92.091886  57.630110
6     1     81.747070  54.339218
...               ...        ...
7     3     71.688795  37.595638
8     1     62.741922  34.618153
      2     91.774627  49.790202
      3     73.936856  60.773900

[12 rows x 2 columns]

For more details and examples see the groupby documentation.