In [1]: dates = pd.date_range('1/1/2000', periods=8)

In [2]: df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D'])

2.11 Boolean indexing

Another common operation is the use of boolean vectors to filter the data. The operators are: | for or, & for and, and ~ for not. These must be grouped by using parentheses.

Using a boolean vector to index a Series works exactly as in a numpy ndarray:

In [3]: s = pd.Series(range(-3, 4))

In [4]: s
Out[4]: 
0   -3
1   -2
2   -1
3    0
4    1
5    2
6    3
dtype: int64

In [5]: s[s > 0]
Out[5]: 
4    1
5    2
6    3
dtype: int64

In [6]: s[(s < -1) | (s > 0.5)]
Out[6]: 
0   -3
1   -2
4    1
5    2
6    3
dtype: int64

In [7]: s[~(s < 0)]
Out[7]: 
3    0
4    1
5    2
6    3
dtype: int64

You may select rows from a DataFrame using a boolean vector the same length as the DataFrame’s index (for example, something derived from one of the columns of the DataFrame):

In [8]: df
Out[8]: 
                 A       B       C       D
2000-01-01  0.4691 -0.2829 -1.5091 -1.1356
2000-01-02  1.2121 -0.1732  0.1192 -1.0442
2000-01-03 -0.8618 -2.1046 -0.4949  1.0718
2000-01-04  0.7216 -0.7068 -1.0396  0.2719
2000-01-05 -0.4250  0.5670  0.2762 -1.0874
2000-01-06 -0.6737  0.1136 -1.4784  0.5250
2000-01-07  0.4047  0.5770 -1.7150 -1.0393
2000-01-08 -0.3706 -1.1579 -1.3443  0.8449

In [9]: df[df['A'] > 0]
Out[9]: 
                 A       B       C       D
2000-01-01  0.4691 -0.2829 -1.5091 -1.1356
2000-01-02  1.2121 -0.1732  0.1192 -1.0442
2000-01-04  0.7216 -0.7068 -1.0396  0.2719
2000-01-07  0.4047  0.5770 -1.7150 -1.0393

List comprehensions and map method of Series can also be used to produce more complex criteria:

In [10]: df2 = pd.DataFrame({'a' : ['one', 'one', 'two', 'three', 'two', 'one', 'six'],
   ....:                     'b' : ['x', 'y', 'y', 'x', 'y', 'x', 'x'],
   ....:                     'c' : np.random.randn(7)})
   ....: 

In [11]: df2
Out[11]: 
       a  b       c
0    one  x  1.0758
1    one  y -0.1090
2    two  y  1.6436
3  three  x -1.4694
4    two  y  0.3570
5    one  x -0.6746
6    six  x -1.7769

# only want 'two' or 'three'
In [12]: criterion = df2['a'].map(lambda x: x.startswith('t'))

In [13]: df2[criterion]
Out[13]: 
       a  b       c
2    two  y  1.6436
3  three  x -1.4694
4    two  y  0.3570

# equivalent but slower
In [14]: df2[[x.startswith('t') for x in df2['a']]]
Out[14]: 
       a  b       c
2    two  y  1.6436
3  three  x -1.4694
4    two  y  0.3570

# Multiple criteria
In [15]: df2[criterion & (df2['b'] == 'x')]
Out[15]: 
       a  b       c
3  three  x -1.4694

Note, with the choice methods Selection by Label, Selection by Position, and Advanced Indexing you may select along more than one axis using boolean vectors combined with other indexing expressions.

In [16]: df2.loc[criterion & (df2['b'] == 'x'),'b':'c']
Out[16]: 
   b       c
3  x -1.4694