.. currentmodule:: pandas .. ipython:: python :suppress: import numpy as np import pandas as pd import os np.random.seed(123456) np.set_printoptions(precision=4, suppress=True) import matplotlib matplotlib.style.use('ggplot') import matplotlib.pyplot as plt pd.options.display.max_rows = 8 Grouping -------- By "group by" we are referring to a process involving one or more of the following steps - **Splitting** the data into groups based on some criteria - **Applying** a function to each group independently - **Combining** the results into a data structure See the :ref:`Grouping section ` .. ipython:: python df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'], 'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'C' : np.random.randn(8), 'D' : np.random.randn(8)}) df Grouping and then applying a function ``sum`` to the resulting groups. .. ipython:: python df.groupby('A').sum() Grouping by multiple columns forms a hierarchical index, which we then apply the function. .. ipython:: python df.groupby(['A','B']).sum()