2.12 Indexing with isin

Consider the isin method of Series, which returns a boolean vector that is true wherever the Series elements exist in the passed list. This allows you to select rows where one or more columns have values you want:

In [1]: s = pd.Series(np.arange(5), index=np.arange(5)[::-1], dtype='int64')

In [2]: s
Out[2]: 
4    0
3    1
2    2
1    3
0    4
dtype: int64

In [3]: s.isin([2, 4, 6])
Out[3]: 
4    False
3    False
2     True
1    False
0     True
dtype: bool

In [4]: s[s.isin([2, 4, 6])]
Out[4]: 
2    2
0    4
dtype: int64

The same method is available for Index objects and is useful for the cases when you don’t know which of the sought labels are in fact present:

In [5]: s[s.index.isin([2, 4, 6])]
Out[5]: 
4    0
2    2
dtype: int64

# compare it to the following
In [6]: s[[2, 4, 6]]
Out[6]: 
2    2.0
4    0.0
6    NaN
dtype: float64

In addition to that, MultiIndex allows selecting a separate level to use in the membership check:

In [7]: s_mi = pd.Series(np.arange(6),
   ...:                  index=pd.MultiIndex.from_product([[0, 1], ['a', 'b', 'c']]))
   ...: 

In [8]: s_mi
Out[8]: 
0  a    0
   b    1
   c    2
1  a    3
   b    4
   c    5
dtype: int64

In [9]: s_mi.iloc[s_mi.index.isin([(1, 'a'), (2, 'b'), (0, 'c')])]
Out[9]: 
0  c    2
1  a    3
dtype: int64

In [10]: s_mi.iloc[s_mi.index.isin(['a', 'c', 'e'], level=1)]
Out[10]: 
0  a    0
   c    2
1  a    3
   c    5
dtype: int64

DataFrame also has an isin method. When calling isin, pass a set of values as either an array or dict. If values is an array, isin returns a DataFrame of booleans that is the same shape as the original DataFrame, with True wherever the element is in the sequence of values.

In [11]: df = pd.DataFrame({'vals': [1, 2, 3, 4], 'ids': ['a', 'b', 'f', 'n'],
   ....:                    'ids2': ['a', 'n', 'c', 'n']})
   ....: 

In [12]: df
Out[12]: 
  ids ids2  vals
0   a    a     1
1   b    n     2
2   f    c     3
3   n    n     4

In [13]: values = ['a', 'b', 1, 3]

In [14]: df.isin(values)
Out[14]: 
     ids   ids2   vals
0   True   True   True
1   True  False  False
2  False  False   True
3  False  False  False

Oftentimes you’ll want to match certain values with certain columns. Just make values a dict where the key is the column, and the value is a list of items you want to check for.

In [15]: values = {'ids': ['a', 'b'], 'vals': [1, 3]}

In [16]: df.isin(values)
Out[16]: 
     ids   ids2   vals
0   True  False   True
1   True  False  False
2  False  False   True
3  False  False  False

Combine DataFrame’s isin with the any() and all() methods to quickly select subsets of your data that meet a given criteria. To select a row where each column meets its own criterion:

In [17]: values = {'ids': ['a', 'b'], 'ids2': ['a', 'c'], 'vals': [1, 3]}

In [18]: row_mask = df.isin(values).all(1)

In [19]: df
Out[19]: 
  ids ids2  vals
0   a    a     1
1   b    n     2
2   f    c     3
3   n    n     4

In [20]: row_mask
Out[20]: 
0     True
1    False
2    False
3    False
dtype: bool

In [21]: df[row_mask]
Out[21]: 
  ids ids2  vals
0   a    a     1