9.5 Testing for Strings that Match or Contain a Pattern

You can check whether elements contain a pattern:

In [1]: pattern = r'[a-z][0-9]'

In [2]: pd.Series(['1', '2', '3a', '3b', '03c']).str.contains(pattern)
Out[2]: 
0    False
1    False
2    False
3    False
4    False
dtype: bool

or match a pattern:

In [3]: pd.Series(['1', '2', '3a', '3b', '03c']).str.match(pattern, as_indexer=True)
Out[3]: 
0    False
1    False
2    False
3    False
4    False
dtype: bool

The distinction between match and contains is strictness: match relies on strict re.match, while contains relies on re.search.

Warning

In previous versions, match was for extracting groups, returning a not-so-convenient Series of tuples. The new method extract (described in the previous section) is now preferred.

This old, deprecated behavior of match is still the default. As demonstrated above, use the new behavior by setting as_indexer=True. In this mode, match is analogous to contains, returning a boolean Series. The new behavior will become the default behavior in a future release.

Methods like match, contains, startswith, and endswith take: an extra na argument so missing values can be considered True or False:

In [4]: s4 = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])

In [5]: s4.str.contains('A', na=False)
Out[5]: 
0     True
1    False
2    False
3     True
     ...  
5    False
6     True
7    False
8    False
dtype: bool