.. currentmodule:: pandas

.. ipython:: python
   :suppress:

   import numpy as np
   import pandas as pd
   import os
   np.random.seed(123456)
   np.set_printoptions(precision=4, suppress=True)
   import matplotlib
   matplotlib.style.use('ggplot')
   import matplotlib.pyplot as plt
   pd.options.display.max_rows = 8

Grouping
--------

By "group by" we are referring to a process involving one or more of the
following steps

 - **Splitting** the data into groups based on some criteria
 - **Applying** a function to each group independently
 - **Combining** the results into a data structure

See the :ref:`Grouping section <groupby>`

.. ipython:: python

   df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
                             'foo', 'bar', 'foo', 'foo'],
                      'B' : ['one', 'one', 'two', 'three',
                             'two', 'two', 'one', 'three'],
                      'C' : np.random.randn(8),
                      'D' : np.random.randn(8)})
   df

Grouping and then applying a function ``sum`` to the resulting groups.

.. ipython:: python

   df.groupby('A').sum()

Grouping by multiple columns forms a hierarchical index, which we then apply
the function.

.. ipython:: python

   df.groupby(['A','B']).sum()