7.7 Tiling
The cut
function computes groupings for the values of the input array and
is often used to transform continuous variables to discrete or categorical
variables:
In [1]: ages = np.array([10, 15, 13, 12, 23, 25, 28, 59, 60])
In [2]: pd.cut(ages, bins=3)
Out[2]:
[(9.95, 26.667], (9.95, 26.667], (9.95, 26.667], (9.95, 26.667], (9.95, 26.667], (9.95, 26.667], (26.667, 43.333], (43.333, 60], (43.333, 60]]
Categories (3, object): [(9.95, 26.667] < (26.667, 43.333] < (43.333, 60]]
If the bins
keyword is an integer, then equal-width bins are formed.
Alternatively we can specify custom bin-edges:
In [3]: pd.cut(ages, bins=[0, 18, 35, 70])
Out[3]:
[(0, 18], (0, 18], (0, 18], (0, 18], (18, 35], (18, 35], (18, 35], (35, 70], (35, 70]]
Categories (3, object): [(0, 18] < (18, 35] < (35, 70]]