5.6 Comparisons

Comparing categorical data with other objects is possible in three cases:

  • comparing equality (== and !=) to a list-like object (list, Series, array, ...) of the same length as the categorical data.
  • all comparisons (==, !=, >, >=, <, and <=) of categorical data to another categorical Series, when ordered==True and the categories are the same.
  • all comparisons of a categorical data to a scalar.

All other comparisons, especially “non-equality” comparisons of two categoricals with different categories or a categorical with any list-like object, will raise a TypeError.

Note

Any “non-equality” comparisons of categorical data with a Series, np.array, list or categorical data with different categories or ordering will raise an TypeError because custom categories ordering could be interpreted in two ways: one with taking into account the ordering and one without.

In [1]: cat = pd.Series([1,2,3]).astype("category", categories=[3,2,1], ordered=True)

In [2]: cat_base = pd.Series([2,2,2]).astype("category", categories=[3,2,1], ordered=True)

In [3]: cat_base2 = pd.Series([2,2,2]).astype("category", ordered=True)

In [4]: cat
Out[4]: 
0    1
1    2
2    3
dtype: category
Categories (3, int64): [3 < 2 < 1]

In [5]: cat_base
Out[5]: 
0    2
1    2
2    2
dtype: category
Categories (3, int64): [3 < 2 < 1]

In [6]: cat_base2
Out[6]: 
0    2
1    2
2    2
dtype: category
Categories (1, int64): [2]

Comparing to a categorical with the same categories and ordering or to a scalar works:

In [7]: cat > cat_base
Out[7]: 
0     True
1    False
2    False
dtype: bool

In [8]: cat > 2
Out[8]: 
0     True
1    False
2    False
dtype: bool

Equality comparisons work with any list-like object of same length and scalars:

In [9]: cat == cat_base
Out[9]: 
0    False
1     True
2    False
dtype: bool

In [10]: cat == np.array([1,2,3])
Out[10]: 
0    True
1    True
2    True
dtype: bool

In [11]: cat == 2
Out[11]: 
0    False
1     True
2    False
dtype: bool

This doesn’t work because the categories are not the same:

In [12]: try:
   ....:     cat > cat_base2
   ....: except TypeError as e:
   ....:      print("TypeError: " + str(e))
   ....: 
TypeError: Categoricals can only be compared if 'categories' are the same

If you want to do a “non-equality” comparison of a categorical series with a list-like object which is not categorical data, you need to be explicit and convert the categorical data back to the original values:

In [13]: base = np.array([1,2,3])

In [14]: try:
   ....:     cat > base
   ....: except TypeError as e:
   ....:      print("TypeError: " + str(e))
   ....: 
TypeError: Cannot compare a Categorical for op __gt__ with type <type 'numpy.ndarray'>.
If you want to compare values, use 'np.asarray(cat) <op> other'.

In [15]: np.asarray(cat) > base
Out[15]: array([False, False, False], dtype=bool)