5.3 Iterating through groups
With the GroupBy object in hand, iterating through the grouped data is very
natural and functions similarly to itertools.groupby
:
In [1]: df
Out[1]:
A B C D
0 foo one 0.4691 -0.8618
1 bar one -0.2829 -2.1046
2 foo two -1.5091 -0.4949
3 bar three -1.1356 1.0718
4 foo two 1.2121 0.7216
5 bar two -0.1732 -0.7068
6 foo one 0.1192 -1.0396
7 foo three -1.0442 0.2719
In [2]: grouped = df.groupby('A')
In [3]: for name, group in grouped:
...: print(name)
...: print(group)
...:
bar
A B C D
1 bar one -0.2829 -2.1046
3 bar three -1.1356 1.0718
5 bar two -0.1732 -0.7068
foo
A B C D
0 foo one 0.4691 -0.8618
2 foo two -1.5091 -0.4949
4 foo two 1.2121 0.7216
6 foo one 0.1192 -1.0396
7 foo three -1.0442 0.2719
In the case of grouping by multiple keys, the group name will be a tuple:
In [4]: for name, group in df.groupby(['A', 'B']):
...: print(name)
...: print(group)
...:
('bar', 'one')
A B C D
1 bar one -0.2829 -2.1046
('bar', 'three')
A B C D
3 bar three -1.1356 1.0718
('bar', 'two')
A B C D
5 bar two -0.1732 -0.7068
('foo', 'one')
A B C D
0 foo one 0.4691 -0.8618
6 foo one 0.1192 -1.0396
('foo', 'three')
A B C D
7 foo three -1.0442 0.2719
('foo', 'two')
A B C D
2 foo two -1.5091 -0.4949
4 foo two 1.2121 0.7216
It’s standard Python-fu but remember you can unpack the tuple in the for loop
statement if you wish: for (k1, k2), group in grouped:
.
5.4 Selecting a group
A single group can be selected using GroupBy.get_group()
:
In [5]: grouped.get_group('bar')
Out[5]:
A B C D
1 bar one -0.2829 -2.1046
3 bar three -1.1356 1.0718
5 bar two -0.1732 -0.7068
Or for an object grouped on multiple columns:
In [6]: df.groupby(['A', 'B']).get_group(('bar', 'one'))
Out[6]:
A B C D
1 bar one -0.2829 -2.1046