3.3 Sorting a MultiIndex
For MultiIndex-ed objects to be indexed & sliced effectively, they need
to be sorted. As with any index, you can use sort_index
.
In [1]: import random; random.shuffle(tuples)
In [2]: s = pd.Series(np.random.randn(8), index=pd.MultiIndex.from_tuples(tuples))
In [3]: s
Out[3]:
baz one 0.206053
bar one -0.251905
foo one -2.213588
qux two 1.063327
baz two 1.266143
qux one 0.299368
bar two -0.863838
foo two 0.408204
dtype: float64
In [4]: s.sort_index()
Out[4]:
bar one -0.251905
two -0.863838
baz one 0.206053
two 1.266143
foo one -2.213588
two 0.408204
qux one 0.299368
two 1.063327
dtype: float64
In [5]: s.sort_index(level=0)
Out[5]:
bar one -0.251905
two -0.863838
baz one 0.206053
two 1.266143
foo one -2.213588
two 0.408204
qux one 0.299368
two 1.063327
dtype: float64
In [6]: s.sort_index(level=1)
Out[6]:
bar one -0.251905
baz one 0.206053
foo one -2.213588
qux one 0.299368
bar two -0.863838
baz two 1.266143
foo two 0.408204
qux two 1.063327
dtype: float64
You may also pass a level name to sort_index
if the MultiIndex levels
are named.
In [7]: s.index.set_names(['L1', 'L2'], inplace=True)
In [8]: s.sort_index(level='L1')
Out[8]:
L1 L2
bar one -0.251905
two -0.863838
baz one 0.206053
two 1.266143
foo one -2.213588
two 0.408204
qux one 0.299368
two 1.063327
dtype: float64
In [9]: s.sort_index(level='L2')
Out[9]:
L1 L2
bar one -0.251905
baz one 0.206053
foo one -2.213588
qux one 0.299368
bar two -0.863838
baz two 1.266143
foo two 0.408204
qux two 1.063327
dtype: float64
On higher dimensional objects, you can sort any of the other axes by level if they have a MultiIndex:
In [10]: df.T.sort_index(level=1, axis=1)
Out[10]:
zero one zero one
x x y y
0 2.410179 0.600178 0.132885 1.519970
1 1.450520 0.274230 -0.023688 -0.493662
Indexing will work even if the data are not sorted, but will be rather
inefficient (and show a PerformanceWarning
). It will also
return a copy of the data rather than a view:
In [11]: dfm = pd.DataFrame({'jim': [0, 0, 1, 1],
....: 'joe': ['x', 'x', 'z', 'y'],
....: 'jolie': np.random.rand(4)})
....:
In [12]: dfm = dfm.set_index(['jim', 'joe'])
In [13]: dfm
Out[13]:
jolie
jim joe
0 x 0.490671
x 0.120248
1 z 0.537020
y 0.110968
In [4]: dfm.loc[(1, 'z')]
PerformanceWarning: indexing past lexsort depth may impact performance.
Out[4]:
jolie
jim joe
1 z 0.64094
Furthermore if you try to index something that is not fully lexsorted, this can raise:
In [5]: dfm.loc[(0,'y'):(1, 'z')]
KeyError: 'Key length (2) was greater than MultiIndex lexsort depth (1)'
The is_lexsorted()
method on an Index
show if the index is sorted, and the lexsort_depth
property returns the sort depth:
In [14]: dfm.index.is_lexsorted()
Out[14]: False
In [15]: dfm.index.lexsort_depth
Out[15]: 1
In [16]: dfm = dfm.sort_index()
In [17]: dfm
Out[17]:
jolie
jim joe
0 x 0.490671
x 0.120248
1 y 0.110968
z 0.537020
In [18]: dfm.index.is_lexsorted()
Out[18]: True
In [19]: dfm.index.lexsort_depth
Out[19]: 2
And now selection works as expected.
In [20]: dfm.loc[(0,'y'):(1, 'z')]
Out[20]:
jolie
jim joe
1 y 0.110968
z 0.537020