.. currentmodule:: pandas .. ipython:: python :suppress: import numpy as np import pandas as pd np.set_printoptions(precision=4, suppress=True) pd.options.display.max_rows = 8 df = pd.DataFrame({'one' : pd.Series(np.random.randn(3), index=['a', 'b', 'c']), 'two' : pd.Series(np.random.randn(4), index=['a', 'b', 'c', 'd']), 'three' : pd.Series(np.random.randn(3), index=['b', 'c', 'd'])}) .. _basics.sorting: Sorting ------- .. warning:: The sorting API is substantially changed in 0.17.0, see :ref:`here ` for these changes. In particular, all sorting methods now return a new object by default, and **DO NOT** operate in-place (except by passing ``inplace=True``). There are two obvious kinds of sorting that you may be interested in: sorting by label and sorting by actual values. By Index ~~~~~~~~ The primary method for sorting axis labels (indexes) are the ``Series.sort_index()`` and the ``DataFrame.sort_index()`` methods. .. ipython:: python df unsorted_df = df.reindex(index=['a', 'd', 'c', 'b'], columns=['three', 'two', 'one']) # DataFrame unsorted_df.sort_index() unsorted_df.sort_index(ascending=False) unsorted_df.sort_index(axis=1) # Series unsorted_df['three'].sort_index() By Values ~~~~~~~~~ The :meth:`Series.sort_values` and :meth:`DataFrame.sort_values` are the entry points for **value** sorting (that is the values in a column or row). :meth:`DataFrame.sort_values` can accept an optional ``by`` argument for ``axis=0`` which will use an arbitrary vector or a column name of the DataFrame to determine the sort order: .. ipython:: python df1 = pd.DataFrame({'one':[2,1,1,1],'two':[1,3,2,4],'three':[5,4,3,2]}) df1.sort_values(by='two') The ``by`` argument can take a list of column names, e.g.: .. ipython:: python df1[['one', 'two', 'three']].sort_values(by=['one','two']) These methods have special treatment of NA values via the ``na_position`` argument: .. ipython:: python s[2] = np.nan s.sort_values() s.sort_values(na_position='first') .. _basics.searchsorted: searchsorted ~~~~~~~~~~~~ Series has the :meth:`~Series.searchsorted` method, which works similar to :meth:`numpy.ndarray.searchsorted`. .. ipython:: python ser = pd.Series([1, 2, 3]) ser.searchsorted([0, 3]) ser.searchsorted([0, 4]) ser.searchsorted([1, 3], side='right') ser.searchsorted([1, 3], side='left') ser = pd.Series([3, 1, 2]) ser.searchsorted([0, 3], sorter=np.argsort(ser)) .. _basics.nsorted: smallest / largest values ~~~~~~~~~~~~~~~~~~~~~~~~~ .. versionadded:: 0.14.0 ``Series`` has the :meth:`~Series.nsmallest` and :meth:`~Series.nlargest` methods which return the smallest or largest :math:`n` values. For a large ``Series`` this can be much faster than sorting the entire Series and calling ``head(n)`` on the result. .. ipython:: python s = pd.Series(np.random.permutation(10)) s s.sort_values() s.nsmallest(3) s.nlargest(3) .. versionadded:: 0.17.0 ``DataFrame`` also has the ``nlargest`` and ``nsmallest`` methods. .. ipython:: python df = pd.DataFrame({'a': [-2, -1, 1, 10, 8, 11, -1], 'b': list('abdceff'), 'c': [1.0, 2.0, 4.0, 3.2, np.nan, 3.0, 4.0]}) df.nlargest(3, 'a') df.nlargest(5, ['a', 'c']) df.nsmallest(3, 'a') df.nsmallest(5, ['a', 'c']) .. _basics.multi-index_sorting: Sorting by a multi-index column ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ You must be explicit about sorting when the column is a multi-index, and fully specify all levels to ``by``. .. ipython:: python df1.columns = pd.MultiIndex.from_tuples([('a','one'),('a','two'),('b','three')]) df1.sort_values(by=('a','two'))