3.3 Sorting a MultiIndex

For MultiIndex-ed objects to be indexed & sliced effectively, they need to be sorted. As with any index, you can use sort_index.

In [1]: import random; random.shuffle(tuples)

In [2]: s = pd.Series(np.random.randn(8), index=pd.MultiIndex.from_tuples(tuples))

In [3]: s
Out[3]: 
baz  one    0.206053
bar  one   -0.251905
foo  one   -2.213588
qux  two    1.063327
baz  two    1.266143
qux  one    0.299368
bar  two   -0.863838
foo  two    0.408204
dtype: float64

In [4]: s.sort_index()
Out[4]: 
bar  one   -0.251905
     two   -0.863838
baz  one    0.206053
     two    1.266143
foo  one   -2.213588
     two    0.408204
qux  one    0.299368
     two    1.063327
dtype: float64

In [5]: s.sort_index(level=0)
Out[5]: 
bar  one   -0.251905
     two   -0.863838
baz  one    0.206053
     two    1.266143
foo  one   -2.213588
     two    0.408204
qux  one    0.299368
     two    1.063327
dtype: float64

In [6]: s.sort_index(level=1)
Out[6]: 
bar  one   -0.251905
baz  one    0.206053
foo  one   -2.213588
qux  one    0.299368
bar  two   -0.863838
baz  two    1.266143
foo  two    0.408204
qux  two    1.063327
dtype: float64

You may also pass a level name to sort_index if the MultiIndex levels are named.

In [7]: s.index.set_names(['L1', 'L2'], inplace=True)

In [8]: s.sort_index(level='L1')
Out[8]: 
L1   L2 
bar  one   -0.251905
     two   -0.863838
baz  one    0.206053
     two    1.266143
foo  one   -2.213588
     two    0.408204
qux  one    0.299368
     two    1.063327
dtype: float64

In [9]: s.sort_index(level='L2')
Out[9]: 
L1   L2 
bar  one   -0.251905
baz  one    0.206053
foo  one   -2.213588
qux  one    0.299368
bar  two   -0.863838
baz  two    1.266143
foo  two    0.408204
qux  two    1.063327
dtype: float64

On higher dimensional objects, you can sort any of the other axes by level if they have a MultiIndex:

In [10]: df.T.sort_index(level=1, axis=1)
Out[10]: 
       zero       one      zero       one
          x         x         y         y
0  2.410179  0.600178  0.132885  1.519970
1  1.450520  0.274230 -0.023688 -0.493662

Indexing will work even if the data are not sorted, but will be rather inefficient (and show a PerformanceWarning). It will also return a copy of the data rather than a view:

In [11]: dfm = pd.DataFrame({'jim': [0, 0, 1, 1],
   ....:                     'joe': ['x', 'x', 'z', 'y'],
   ....:                     'jolie': np.random.rand(4)})
   ....: 

In [12]: dfm = dfm.set_index(['jim', 'joe'])

In [13]: dfm
Out[13]: 
            jolie
jim joe          
0   x    0.490671
    x    0.120248
1   z    0.537020
    y    0.110968
In [4]: dfm.loc[(1, 'z')]
PerformanceWarning: indexing past lexsort depth may impact performance.

Out[4]:
           jolie
jim joe
1   z    0.64094

Furthermore if you try to index something that is not fully lexsorted, this can raise:

In [5]: dfm.loc[(0,'y'):(1, 'z')]
KeyError: 'Key length (2) was greater than MultiIndex lexsort depth (1)'

The is_lexsorted() method on an Index show if the index is sorted, and the lexsort_depth property returns the sort depth:

In [14]: dfm.index.is_lexsorted()
Out[14]: False

In [15]: dfm.index.lexsort_depth
Out[15]: 1
In [16]: dfm = dfm.sort_index()

In [17]: dfm
Out[17]: 
            jolie
jim joe          
0   x    0.490671
    x    0.120248
1   y    0.110968
    z    0.537020

In [18]: dfm.index.is_lexsorted()
Out[18]: True

In [19]: dfm.index.lexsort_depth
Out[19]: 2

And now selection works as expected.

In [20]: dfm.loc[(0,'y'):(1, 'z')]
Out[20]: 
            jolie
jim joe          
1   y    0.110968
    z    0.537020