9.5 Testing for Strings that Match or Contain a Pattern
You can check whether elements contain a pattern:
In [1]: pattern = r'[a-z][0-9]'
In [2]: pd.Series(['1', '2', '3a', '3b', '03c']).str.contains(pattern)
Out[2]:
0 False
1 False
2 False
3 False
4 False
dtype: bool
or match a pattern:
In [3]: pd.Series(['1', '2', '3a', '3b', '03c']).str.match(pattern, as_indexer=True)
Out[3]:
0 False
1 False
2 False
3 False
4 False
dtype: bool
The distinction between match and contains is strictness: match
relies on strict re.match, while contains relies on re.search.
Warning
In previous versions, match was for extracting groups,
returning a not-so-convenient Series of tuples. The new method extract
(described in the previous section) is now preferred.
This old, deprecated behavior of match is still the default. As
demonstrated above, use the new behavior by setting as_indexer=True.
In this mode, match is analogous to contains, returning a boolean
Series. The new behavior will become the default behavior in a future
release.
- Methods like
match,contains,startswith, andendswithtake - an extra
naargument so missing values can be considered True or False:
In [4]: s4 = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])
In [5]: s4.str.contains('A', na=False)
Out[5]:
0 True
1 False
2 False
3 True
...
5 False
6 True
7 False
8 False
dtype: bool