GroupBy
GroupBy objects are returned by groupby calls: pandas.DataFrame.groupby()
, pandas.Series.groupby()
, etc.
Indexing, iteration
GroupBy.__iter__ () |
Groupby iterator |
GroupBy.groups |
dict {group name -> group labels} |
GroupBy.indices |
dict {group name -> group indices} |
GroupBy.get_group (name[, obj]) |
Constructs NDFrame from group with provided name |
Grouper ([key, level, freq, axis, sort]) |
A Grouper allows the user to specify a groupby instruction for a target |
Function application
GroupBy.apply (func, *args, **kwargs) |
Apply function and combine results together in an intelligent way. |
GroupBy.aggregate (func, *args, **kwargs) |
|
GroupBy.transform (func, *args, **kwargs) |
Computations / Descriptive Stats
GroupBy.count () |
Compute count of group, excluding missing values |
GroupBy.cumcount ([ascending]) |
Number each item in each group from 0 to the length of that group - 1. |
GroupBy.first () |
Compute first of group values |
GroupBy.head ([n]) |
Returns first n rows of each group. |
GroupBy.last () |
Compute last of group values |
GroupBy.max () |
Compute max of group values |
GroupBy.mean (*args, **kwargs) |
Compute mean of groups, excluding missing values |
GroupBy.median () |
Compute median of groups, excluding missing values |
GroupBy.min () |
Compute min of group values |
GroupBy.nth (n[, dropna]) |
Take the nth row from each group if n is an int, or a subset of rows if n is a list of ints. |
GroupBy.ohlc () |
Compute sum of values, excluding missing values |
GroupBy.prod () |
Compute prod of group values |
GroupBy.size () |
Compute group sizes |
GroupBy.sem ([ddof]) |
Compute standard error of the mean of groups, excluding missing values |
GroupBy.std ([ddof]) |
Compute standard deviation of groups, excluding missing values |
GroupBy.sum () |
Compute sum of group values |
GroupBy.var ([ddof]) |
Compute variance of groups, excluding missing values |
GroupBy.tail ([n]) |
Returns last n rows of each group |
The following methods are available in both SeriesGroupBy
and
DataFrameGroupBy
objects, but may differ slightly, usually in that
the DataFrameGroupBy
version usually permits the specification of an
axis argument, and often an argument indicating whether to restrict
application to columns of a specific data type.
DataFrameGroupBy.agg (arg, *args, **kwargs) |
Aggregate using input function or dict of {column -> |
DataFrameGroupBy.all ([axis, bool_only, ...]) |
Return whether all elements are True over requested axis |
DataFrameGroupBy.any ([axis, bool_only, ...]) |
Return whether any element is True over requested axis |
DataFrameGroupBy.bfill ([limit]) |
Backward fill the values |
DataFrameGroupBy.corr ([method, min_periods]) |
Compute pairwise correlation of columns, excluding NA/null values |
DataFrameGroupBy.count () |
Compute count of group, excluding missing values |
DataFrameGroupBy.cov ([min_periods]) |
Compute pairwise covariance of columns, excluding NA/null values |
DataFrameGroupBy.cummax ([axis, skipna]) |
Return cumulative max over requested axis. |
DataFrameGroupBy.cummin ([axis, skipna]) |
Return cumulative minimum over requested axis. |
DataFrameGroupBy.cumprod ([axis]) |
Cumulative product for each group |
DataFrameGroupBy.cumsum ([axis]) |
Cumulative sum for each group |
DataFrameGroupBy.describe ([percentiles, ...]) |
Generate various summary statistics, excluding NaN values. |
DataFrameGroupBy.diff ([periods, axis]) |
1st discrete difference of object |
DataFrameGroupBy.ffill ([limit]) |
Forward fill the values |
DataFrameGroupBy.fillna ([value, method, ...]) |
Fill NA/NaN values using the specified method |
DataFrameGroupBy.hist (data[, column, by, ...]) |
Draw histogram of the DataFrame’s series using matplotlib / pylab. |
DataFrameGroupBy.idxmax ([axis, skipna]) |
Return index of first occurrence of maximum over requested axis. |
DataFrameGroupBy.idxmin ([axis, skipna]) |
Return index of first occurrence of minimum over requested axis. |
DataFrameGroupBy.mad ([axis, skipna, level]) |
Return the mean absolute deviation of the values for the requested axis |
DataFrameGroupBy.pct_change ([periods, ...]) |
Percent change over given number of periods. |
DataFrameGroupBy.plot |
Class implementing the .plot attribute for groupby objects |
DataFrameGroupBy.quantile ([q, axis, ...]) |
Return values at the given quantile over requested axis, a la numpy.percentile. |
DataFrameGroupBy.rank ([axis, method, ...]) |
Compute numerical data ranks (1 through n) along axis. |
DataFrameGroupBy.resample (rule, *args, **kwargs) |
Provide resampling when using a TimeGrouper |
DataFrameGroupBy.shift ([periods, freq, axis]) |
Shift each group by periods observations |
DataFrameGroupBy.size () |
Compute group sizes |
DataFrameGroupBy.skew ([axis, skipna, level, ...]) |
Return unbiased skew over requested axis |
DataFrameGroupBy.take (indices[, axis, ...]) |
Analogous to ndarray.take |
DataFrameGroupBy.tshift ([periods, freq, axis]) |
Shift the time index, using the index’s frequency if available. |
The following methods are available only for SeriesGroupBy
objects.
SeriesGroupBy.nlargest (*args, **kwargs) |
Return the largest n elements. |
SeriesGroupBy.nsmallest (*args, **kwargs) |
Return the smallest n elements. |
SeriesGroupBy.nunique ([dropna]) |
Returns number of unique elements in the group |
SeriesGroupBy.unique () |
Return array of unique values in the object. |
SeriesGroupBy.value_counts ([normalize, ...]) |
The following methods are available only for DataFrameGroupBy
objects.
DataFrameGroupBy.corrwith (other[, axis, drop]) |
Compute pairwise correlation between rows or columns of two DataFrame objects. |
DataFrameGroupBy.boxplot (grouped[, ...]) |
Make box plots from DataFrameGroupBy data. |