5.11 Differences to R’s factor

The following differences to R’s factor functions can be observed:

  • R’s levels are named categories
  • R’s levels are always of type string, while categories in pandas can be of any dtype.
  • It’s not possible to specify labels at creation time. Use s.cat.rename_categories(new_labels) afterwards.
  • In contrast to R’s factor function, using categorical data as the sole input to create a new categorical series will not remove unused categories but create a new categorical series which is equal to the passed in one!
  • R allows for missing values to be included in its levels (pandas’ categories). Pandas does not allow NaN categories, but missing values can still be in the values.