2.12 Indexing with isin
Consider the isin
method of Series, which returns a boolean vector that is
true wherever the Series elements exist in the passed list. This allows you to
select rows where one or more columns have values you want:
In [1]: s = pd.Series(np.arange(5), index=np.arange(5)[::-1], dtype='int64')
In [2]: s
Out[2]:
4 0
3 1
2 2
1 3
0 4
dtype: int64
In [3]: s.isin([2, 4, 6])
Out[3]:
4 False
3 False
2 True
1 False
0 True
dtype: bool
In [4]: s[s.isin([2, 4, 6])]
Out[4]:
2 2
0 4
dtype: int64
The same method is available for Index
objects and is useful for the cases
when you don’t know which of the sought labels are in fact present:
In [5]: s[s.index.isin([2, 4, 6])]
Out[5]:
4 0
2 2
dtype: int64
# compare it to the following
In [6]: s[[2, 4, 6]]
Out[6]:
2 2.0
4 0.0
6 NaN
dtype: float64
In addition to that, MultiIndex
allows selecting a separate level to use
in the membership check:
In [7]: s_mi = pd.Series(np.arange(6),
...: index=pd.MultiIndex.from_product([[0, 1], ['a', 'b', 'c']]))
...:
In [8]: s_mi
Out[8]:
0 a 0
b 1
c 2
1 a 3
b 4
c 5
dtype: int64
In [9]: s_mi.iloc[s_mi.index.isin([(1, 'a'), (2, 'b'), (0, 'c')])]
Out[9]:
0 c 2
1 a 3
dtype: int64
In [10]: s_mi.iloc[s_mi.index.isin(['a', 'c', 'e'], level=1)]
Out[10]:
0 a 0
c 2
1 a 3
c 5
dtype: int64
DataFrame also has an isin
method. When calling isin
, pass a set of
values as either an array or dict. If values is an array, isin
returns
a DataFrame of booleans that is the same shape as the original DataFrame, with True
wherever the element is in the sequence of values.
In [11]: df = pd.DataFrame({'vals': [1, 2, 3, 4], 'ids': ['a', 'b', 'f', 'n'],
....: 'ids2': ['a', 'n', 'c', 'n']})
....:
In [12]: df
Out[12]:
ids ids2 vals
0 a a 1
1 b n 2
2 f c 3
3 n n 4
In [13]: values = ['a', 'b', 1, 3]
In [14]: df.isin(values)
Out[14]:
ids ids2 vals
0 True True True
1 True False False
2 False False True
3 False False False
Oftentimes you’ll want to match certain values with certain columns.
Just make values a dict
where the key is the column, and the value is
a list of items you want to check for.
In [15]: values = {'ids': ['a', 'b'], 'vals': [1, 3]}
In [16]: df.isin(values)
Out[16]:
ids ids2 vals
0 True False True
1 True False False
2 False False True
3 False False False
Combine DataFrame’s isin
with the any()
and all()
methods to
quickly select subsets of your data that meet a given criteria.
To select a row where each column meets its own criterion:
In [17]: values = {'ids': ['a', 'b'], 'ids2': ['a', 'c'], 'vals': [1, 3]}
In [18]: row_mask = df.isin(values).all(1)
In [19]: df
Out[19]:
ids ids2 vals
0 a a 1
1 b n 2
2 f c 3
3 n n 4
In [20]: row_mask
Out[20]:
0 True
1 False
2 False
3 False
dtype: bool
In [21]: df[row_mask]
Out[21]:
ids ids2 vals
0 a a 1