5.6 Comparisons
Comparing categorical data with other objects is possible in three cases:
- comparing equality (
==
and!=
) to a list-like object (list, Series, array, ...) of the same length as the categorical data.- all comparisons (
==
,!=
,>
,>=
,<
, and<=
) of categorical data to another categorical Series, whenordered==True
and the categories are the same.- all comparisons of a categorical data to a scalar.
All other comparisons, especially “non-equality” comparisons of two categoricals with different categories or a categorical with any list-like object, will raise a TypeError.
Note
Any “non-equality” comparisons of categorical data with a Series, np.array, list or categorical data with different categories or ordering will raise an TypeError because custom categories ordering could be interpreted in two ways: one with taking into account the ordering and one without.
In [1]: cat = pd.Series([1,2,3]).astype("category", categories=[3,2,1], ordered=True)
In [2]: cat_base = pd.Series([2,2,2]).astype("category", categories=[3,2,1], ordered=True)
In [3]: cat_base2 = pd.Series([2,2,2]).astype("category", ordered=True)
In [4]: cat
Out[4]:
0 1
1 2
2 3
dtype: category
Categories (3, int64): [3 < 2 < 1]
In [5]: cat_base
Out[5]:
0 2
1 2
2 2
dtype: category
Categories (3, int64): [3 < 2 < 1]
In [6]: cat_base2
Out[6]:
0 2
1 2
2 2
dtype: category
Categories (1, int64): [2]
Comparing to a categorical with the same categories and ordering or to a scalar works:
In [7]: cat > cat_base
Out[7]:
0 True
1 False
2 False
dtype: bool
In [8]: cat > 2
Out[8]:
0 True
1 False
2 False
dtype: bool
Equality comparisons work with any list-like object of same length and scalars:
In [9]: cat == cat_base
Out[9]:
0 False
1 True
2 False
dtype: bool
In [10]: cat == np.array([1,2,3])
Out[10]:
0 True
1 True
2 True
dtype: bool
In [11]: cat == 2
Out[11]:
0 False
1 True
2 False
dtype: bool
This doesn’t work because the categories are not the same:
In [12]: try:
....: cat > cat_base2
....: except TypeError as e:
....: print("TypeError: " + str(e))
....:
TypeError: Categoricals can only be compared if 'categories' are the same
If you want to do a “non-equality” comparison of a categorical series with a list-like object which is not categorical data, you need to be explicit and convert the categorical data back to the original values:
In [13]: base = np.array([1,2,3])
In [14]: try:
....: cat > base
....: except TypeError as e:
....: print("TypeError: " + str(e))
....:
TypeError: Cannot compare a Categorical for op __gt__ with type <type 'numpy.ndarray'>.
If you want to compare values, use 'np.asarray(cat) <op> other'.
In [15]: np.asarray(cat) > base
Out[15]: array([False, False, False], dtype=bool)