2.7 Selection By Callable

New in version 0.18.1.

.loc, .iloc, .ix and also [] indexing can accept a callable as indexer. The callable must be a function with one argument (the calling Series, DataFrame or Panel) and that returns valid output for indexing.

In [1]: df1 = pd.DataFrame(np.random.randn(6, 4),
   ...:                    index=list('abcdef'),
   ...:                    columns=list('ABCD'))
   ...: 

In [2]: df1
Out[2]: 
        A       B       C       D
a  0.4691 -0.2829 -1.5091 -1.1356
b  1.2121 -0.1732  0.1192 -1.0442
c -0.8618 -2.1046 -0.4949  1.0718
d  0.7216 -0.7068 -1.0396  0.2719
e -0.4250  0.5670  0.2762 -1.0874
f -0.6737  0.1136 -1.4784  0.5250

In [3]: df1.loc[lambda df: df.A > 0, :]
Out[3]: 
        A       B       C       D
a  0.4691 -0.2829 -1.5091 -1.1356
b  1.2121 -0.1732  0.1192 -1.0442
d  0.7216 -0.7068 -1.0396  0.2719

In [4]: df1.loc[:, lambda df: ['A', 'B']]
Out[4]: 
        A       B
a  0.4691 -0.2829
b  1.2121 -0.1732
c -0.8618 -2.1046
d  0.7216 -0.7068
e -0.4250  0.5670
f -0.6737  0.1136

In [5]: df1.iloc[:, lambda df: [0, 1]]
Out[5]: 
        A       B
a  0.4691 -0.2829
b  1.2121 -0.1732
c -0.8618 -2.1046
d  0.7216 -0.7068
e -0.4250  0.5670
f -0.6737  0.1136

In [6]: df1[lambda df: df.columns[0]]
Out[6]: 
a    0.4691
b    1.2121
c   -0.8618
d    0.7216
e   -0.4250
f   -0.6737
Name: A, dtype: float64

You can use callable indexing in Series.

In [7]: df1.A.loc[lambda s: s > 0]
Out[7]: 
a    0.4691
b    1.2121
d    0.7216
Name: A, dtype: float64

Using these methods / indexers, you can chain data selection operations without using temporary variable.

In [8]: bb = pd.read_csv('https://raw.githubusercontent.com/pydata/pandas/master/doc/data/baseball.csv',
   ...:     index_col='id')
   ...: 

In [9]: bb.head()
Out[9]: 
          player  year  stint team  ...   hbp   sh   sf  gidp
id                                  ...                      
88641  womacto01  2006      2  CHN  ...   0.0  3.0  0.0   0.0
88643  schilcu01  2006      1  BOS  ...   0.0  0.0  0.0   0.0
88645  myersmi01  2006      1  NYA  ...   0.0  0.0  0.0   0.0
88649  helliri01  2006      1  MIL  ...   0.0  0.0  0.0   0.0
88650  johnsra05  2006      1  NYA  ...   0.0  0.0  0.0   0.0

[5 rows x 22 columns]

In [10]: (bb.groupby(['year', 'team']).sum()
   ....:    .loc[lambda df: df.r > 100])
   ....: 
Out[10]: 
           stint    g    ab    r  ...    hbp    sh    sf  gidp
year team                         ...                         
2007 CIN       6  379   745  101  ...    1.0   1.0  15.0  18.0
     DET       5  301  1062  162  ...   10.0   4.0   8.0  28.0
     HOU       4  311   926  109  ...    9.0  16.0   6.0  17.0
     LAN      11  413  1021  153  ...    9.0   3.0   8.0  29.0
     NYN      13  622  1854  240  ...   23.0  18.0  15.0  48.0
     SFN       5  482  1305  198  ...    8.0  16.0   6.0  41.0
     TEX       2  198   729  115  ...    5.0   2.0   8.0  16.0
     TOR       4  459  1408  187  ...   12.0   4.0  16.0  38.0

[8 rows x 18 columns]