.. currentmodule:: pandas .. ipython:: python :suppress: import numpy as np import pandas as pd import os np.random.seed(123456) np.set_printoptions(precision=4, suppress=True) import matplotlib matplotlib.style.use('ggplot') import matplotlib.pyplot as plt pd.options.display.max_rows = 8 Categoricals ------------ Since version 0.15, pandas can include categorical data in a ``DataFrame``. For full docs, see the :ref:`categorical introduction ` and the :ref:`API documentation `. .. ipython:: python df = pd.DataFrame({"id":[1,2,3,4,5,6], "raw_grade":['a', 'b', 'b', 'a', 'a', 'e']}) Convert the raw grades to a categorical data type. .. ipython:: python df["grade"] = df["raw_grade"].astype("category") df["grade"] Rename the categories to more meaningful names (assigning to ``Series.cat.categories`` is inplace!) .. ipython:: python df["grade"].cat.categories = ["very good", "good", "very bad"] Reorder the categories and simultaneously add the missing categories (methods under ``Series .cat`` return a new ``Series`` per default). .. ipython:: python df["grade"] = df["grade"].cat.set_categories(["very bad", "bad", "medium", "good", "very good"]) df["grade"] Sorting is per order in the categories, not lexical order. .. ipython:: python df.sort_values(by="grade") Grouping by a categorical column shows also empty categories. .. ipython:: python df.groupby("grade").size()