2.7 Selection By Callable
New in version 0.18.1.
.loc
, .iloc
, .ix
and also []
indexing can accept a callable
as indexer.
The callable
must be a function with one argument (the calling Series, DataFrame or Panel) and that returns valid output for indexing.
In [1]: df1 = pd.DataFrame(np.random.randn(6, 4),
...: index=list('abcdef'),
...: columns=list('ABCD'))
...:
In [2]: df1
Out[2]:
A B C D
a 0.4691 -0.2829 -1.5091 -1.1356
b 1.2121 -0.1732 0.1192 -1.0442
c -0.8618 -2.1046 -0.4949 1.0718
d 0.7216 -0.7068 -1.0396 0.2719
e -0.4250 0.5670 0.2762 -1.0874
f -0.6737 0.1136 -1.4784 0.5250
In [3]: df1.loc[lambda df: df.A > 0, :]
Out[3]:
A B C D
a 0.4691 -0.2829 -1.5091 -1.1356
b 1.2121 -0.1732 0.1192 -1.0442
d 0.7216 -0.7068 -1.0396 0.2719
In [4]: df1.loc[:, lambda df: ['A', 'B']]
Out[4]:
A B
a 0.4691 -0.2829
b 1.2121 -0.1732
c -0.8618 -2.1046
d 0.7216 -0.7068
e -0.4250 0.5670
f -0.6737 0.1136
In [5]: df1.iloc[:, lambda df: [0, 1]]
Out[5]:
A B
a 0.4691 -0.2829
b 1.2121 -0.1732
c -0.8618 -2.1046
d 0.7216 -0.7068
e -0.4250 0.5670
f -0.6737 0.1136
In [6]: df1[lambda df: df.columns[0]]
Out[6]:
a 0.4691
b 1.2121
c -0.8618
d 0.7216
e -0.4250
f -0.6737
Name: A, dtype: float64
You can use callable indexing in Series
.
In [7]: df1.A.loc[lambda s: s > 0]
Out[7]:
a 0.4691
b 1.2121
d 0.7216
Name: A, dtype: float64
Using these methods / indexers, you can chain data selection operations without using temporary variable.
In [8]: bb = pd.read_csv('https://raw.githubusercontent.com/pydata/pandas/master/doc/data/baseball.csv',
...: index_col='id')
...:
In [9]: bb.head()
Out[9]:
player year stint team ... hbp sh sf gidp
id ...
88641 womacto01 2006 2 CHN ... 0.0 3.0 0.0 0.0
88643 schilcu01 2006 1 BOS ... 0.0 0.0 0.0 0.0
88645 myersmi01 2006 1 NYA ... 0.0 0.0 0.0 0.0
88649 helliri01 2006 1 MIL ... 0.0 0.0 0.0 0.0
88650 johnsra05 2006 1 NYA ... 0.0 0.0 0.0 0.0
[5 rows x 22 columns]
In [10]: (bb.groupby(['year', 'team']).sum()
....: .loc[lambda df: df.r > 100])
....:
Out[10]:
stint g ab r ... hbp sh sf gidp
year team ...
2007 CIN 6 379 745 101 ... 1.0 1.0 15.0 18.0
DET 5 301 1062 162 ... 10.0 4.0 8.0 28.0
HOU 4 311 926 109 ... 9.0 16.0 6.0 17.0
LAN 11 413 1021 153 ... 9.0 3.0 8.0 29.0
NYN 13 622 1854 240 ... 23.0 18.0 15.0 48.0
SFN 5 482 1305 198 ... 8.0 16.0 6.0 41.0
TEX 2 198 729 115 ... 5.0 2.0 8.0 16.0
TOR 4 459 1408 187 ... 12.0 4.0 16.0 38.0
[8 rows x 18 columns]