9.5 Testing for Strings that Match or Contain a Pattern
You can check whether elements contain a pattern:
In [1]: pattern = r'[a-z][0-9]'
In [2]: pd.Series(['1', '2', '3a', '3b', '03c']).str.contains(pattern)
Out[2]:
0 False
1 False
2 False
3 False
4 False
dtype: bool
or match a pattern:
In [3]: pd.Series(['1', '2', '3a', '3b', '03c']).str.match(pattern, as_indexer=True)
Out[3]:
0 False
1 False
2 False
3 False
4 False
dtype: bool
The distinction between match
and contains
is strictness: match
relies on strict re.match
, while contains
relies on re.search
.
Warning
In previous versions, match
was for extracting groups,
returning a not-so-convenient Series of tuples. The new method extract
(described in the previous section) is now preferred.
This old, deprecated behavior of match
is still the default. As
demonstrated above, use the new behavior by setting as_indexer=True
.
In this mode, match
is analogous to contains
, returning a boolean
Series. The new behavior will become the default behavior in a future
release.
- Methods like
match
,contains
,startswith
, andendswith
take - an extra
na
argument so missing values can be considered True or False:
In [4]: s4 = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])
In [5]: s4.str.contains('A', na=False)
Out[5]:
0 True
1 False
2 False
3 True
...
5 False
6 True
7 False
8 False
dtype: bool