3.2 Advanced indexing with hierarchical index
Syntactically integrating MultiIndex
in advanced indexing with .loc/.ix
is a
bit challenging, but we’ve made every effort to do so. for example the
following works as you would expect:
In [1]: df
Out[1]:
first bar baz foo qux \
second one two one two one two one
A 0.895717 0.805244 -1.206412 2.565646 1.431256 1.340309 -1.170299
B 0.410835 0.813850 0.132003 -0.827317 -0.076467 -1.187678 1.130127
C -1.413681 1.607920 1.024180 0.569605 0.875906 -2.211372 0.974466
first
second two
A -0.226169
B -1.436737
C -2.006747
In [2]: df = df.T
In [3]: df
Out[3]:
A B C
first second
bar one 0.895717 0.410835 -1.413681
two 0.805244 0.813850 1.607920
baz one -1.206412 0.132003 1.024180
two 2.565646 -0.827317 0.569605
foo one 1.431256 -0.076467 0.875906
two 1.340309 -1.187678 -2.211372
qux one -1.170299 1.130127 0.974466
two -0.226169 -1.436737 -2.006747
In [4]: df.loc['bar']
Out[4]:
A B C
second
one 0.895717 0.410835 -1.413681
two 0.805244 0.813850 1.607920
In [5]: df.loc['bar', 'two']
Out[5]:
A 0.805244
B 0.813850
C 1.607920
Name: (bar, two), dtype: float64
“Partial” slicing also works quite nicely.
In [6]: df.loc['baz':'foo']
Out[6]:
A B C
first second
baz one -1.206412 0.132003 1.024180
two 2.565646 -0.827317 0.569605
foo one 1.431256 -0.076467 0.875906
two 1.340309 -1.187678 -2.211372
You can slice with a ‘range’ of values, by providing a slice of tuples.
In [7]: df.loc[('baz', 'two'):('qux', 'one')]
Out[7]:
A B C
first second
baz two 2.565646 -0.827317 0.569605
foo one 1.431256 -0.076467 0.875906
two 1.340309 -1.187678 -2.211372
qux one -1.170299 1.130127 0.974466
In [8]: df.loc[('baz', 'two'):'foo']
Out[8]:
A B C
first second
baz two 2.565646 -0.827317 0.569605
foo one 1.431256 -0.076467 0.875906
two 1.340309 -1.187678 -2.211372
Passing a list of labels or tuples works similar to reindexing:
In [9]: df.ix[[('bar', 'two'), ('qux', 'one')]]
Out[9]:
A B C
first second
bar two 0.805244 0.813850 1.607920
qux one -1.170299 1.130127 0.974466
3.2.1 Using slicers
New in version 0.14.0.
In 0.14.0 we added a new way to slice multi-indexed objects. You can slice a multi-index by providing multiple indexers.
You can provide any of the selectors as if you are indexing by label, see Selection by Label, including slices, lists of labels, labels, and boolean indexers.
You can use slice(None)
to select all the contents of that level. You do not need to specify all the
deeper levels, they will be implied as slice(None)
.
As usual, both sides of the slicers are included as this is label indexing.
Warning
You should specify all axes in the .loc
specifier, meaning the indexer for the index and
for the columns. There are some ambiguous cases where the passed indexer could be mis-interpreted
as indexing both axes, rather than into say the MuliIndex for the rows.
You should do this:
df.loc[(slice('A1','A3'),.....),:]
rather than this:
df.loc[(slice('A1','A3'),.....)]
In [10]: def mklbl(prefix,n):
....: return ["%s%s" % (prefix,i) for i in range(n)]
....:
In [11]: miindex = pd.MultiIndex.from_product([mklbl('A',4),
....: mklbl('B',2),
....: mklbl('C',4),
....: mklbl('D',2)])
....:
In [12]: micolumns = pd.MultiIndex.from_tuples([('a','foo'),('a','bar'),
....: ('b','foo'),('b','bah')],
....: names=['lvl0', 'lvl1'])
....:
In [13]: dfmi = pd.DataFrame(np.arange(len(miindex)*len(micolumns)).reshape((len(miindex),len(micolumns))),
....: index=miindex,
....: columns=micolumns).sort_index().sort_index(axis=1)
....:
In [14]: dfmi
Out[14]:
lvl0 a b
lvl1 bar foo bah foo
A0 B0 C0 D0 1 0 3 2
D1 5 4 7 6
C1 D0 9 8 11 10
D1 13 12 15 14
... ... ... ... ...
A3 B1 C2 D0 241 240 243 242
D1 245 244 247 246
C3 D0 249 248 251 250
D1 253 252 255 254
[64 rows x 4 columns]
Basic multi-index slicing using slices, lists, and labels.
In [15]: dfmi.loc[(slice('A1','A3'),slice(None), ['C1','C3']),:]
Out[15]:
lvl0 a b
lvl1 bar foo bah foo
A1 B0 C1 D0 73 72 75 74
D1 77 76 79 78
C3 D0 89 88 91 90
D1 93 92 95 94
... ... ... ... ...
A3 B1 C1 D0 233 232 235 234
D1 237 236 239 238
C3 D0 249 248 251 250
D1 253 252 255 254
[24 rows x 4 columns]
You can use a pd.IndexSlice
to have a more natural syntax using :
rather than using slice(None)
In [16]: idx = pd.IndexSlice
In [17]: dfmi.loc[idx[:,:,['C1','C3']],idx[:,'foo']]
Out[17]:
lvl0 a b
lvl1 foo foo
A0 B0 C1 D0 8 10
D1 12 14
C3 D0 24 26
D1 28 30
... ... ...
A3 B1 C1 D0 232 234
D1 236 238
C3 D0 248 250
D1 252 254
[32 rows x 2 columns]
It is possible to perform quite complicated selections using this method on multiple axes at the same time.
In [18]: dfmi.loc['A1',(slice(None),'foo')]
Out[18]:
lvl0 a b
lvl1 foo foo
B0 C0 D0 64 66
D1 68 70
C1 D0 72 74
D1 76 78
... ... ...
B1 C2 D0 112 114
D1 116 118
C3 D0 120 122
D1 124 126
[16 rows x 2 columns]
In [19]: dfmi.loc[idx[:,:,['C1','C3']],idx[:,'foo']]
Out[19]:
lvl0 a b
lvl1 foo foo
A0 B0 C1 D0 8 10
D1 12 14
C3 D0 24 26
D1 28 30
... ... ...
A3 B1 C1 D0 232 234
D1 236 238
C3 D0 248 250
D1 252 254
[32 rows x 2 columns]
Using a boolean indexer you can provide selection related to the values.
In [20]: mask = dfmi[('a','foo')]>200
In [21]: dfmi.loc[idx[mask,:,['C1','C3']],idx[:,'foo']]
Out[21]:
lvl0 a b
lvl1 foo foo
A3 B0 C1 D1 204 206
C3 D0 216 218
D1 220 222
B1 C1 D0 232 234
D1 236 238
C3 D0 248 250
D1 252 254
You can also specify the axis
argument to .loc
to interpret the passed
slicers on a single axis.
In [22]: dfmi.loc(axis=0)[:,:,['C1','C3']]
Out[22]:
lvl0 a b
lvl1 bar foo bah foo
A0 B0 C1 D0 9 8 11 10
D1 13 12 15 14
C3 D0 25 24 27 26
D1 29 28 31 30
... ... ... ... ...
A3 B1 C1 D0 233 232 235 234
D1 237 236 239 238
C3 D0 249 248 251 250
D1 253 252 255 254
[32 rows x 4 columns]
Furthermore you can set the values using these methods
In [23]: df2 = dfmi.copy()
In [24]: df2.loc(axis=0)[:,:,['C1','C3']] = -10
In [25]: df2
Out[25]:
lvl0 a b
lvl1 bar foo bah foo
A0 B0 C0 D0 1 0 3 2
D1 5 4 7 6
C1 D0 -10 -10 -10 -10
D1 -10 -10 -10 -10
... ... ... ... ...
A3 B1 C2 D0 241 240 243 242
D1 245 244 247 246
C3 D0 -10 -10 -10 -10
D1 -10 -10 -10 -10
[64 rows x 4 columns]
You can use a right-hand-side of an alignable object as well.
In [26]: df2 = dfmi.copy()
In [27]: df2.loc[idx[:,:,['C1','C3']],:] = df2*1000
In [28]: df2
Out[28]:
lvl0 a b
lvl1 bar foo bah foo
A0 B0 C0 D0 1 0 3 2
D1 5 4 7 6
C1 D0 9000 8000 11000 10000
D1 13000 12000 15000 14000
... ... ... ... ...
A3 B1 C2 D0 241 240 243 242
D1 245 244 247 246
C3 D0 249000 248000 251000 250000
D1 253000 252000 255000 254000
[64 rows x 4 columns]
3.2.2 Cross-section
The xs
method of DataFrame
additionally takes a level argument to make
selecting data at a particular level of a MultiIndex easier.
In [29]: df
Out[29]:
A B C
first second
bar one 0.895717 0.410835 -1.413681
two 0.805244 0.813850 1.607920
baz one -1.206412 0.132003 1.024180
two 2.565646 -0.827317 0.569605
foo one 1.431256 -0.076467 0.875906
two 1.340309 -1.187678 -2.211372
qux one -1.170299 1.130127 0.974466
two -0.226169 -1.436737 -2.006747
In [30]: df.xs('one', level='second')
Out[30]:
A B C
first
bar 0.895717 0.410835 -1.413681
baz -1.206412 0.132003 1.024180
foo 1.431256 -0.076467 0.875906
qux -1.170299 1.130127 0.974466
# using the slicers (new in 0.14.0)
In [31]: df.loc[(slice(None),'one'),:]
Out[31]:
A B C
first second
bar one 0.895717 0.410835 -1.413681
baz one -1.206412 0.132003 1.024180
foo one 1.431256 -0.076467 0.875906
qux one -1.170299 1.130127 0.974466
You can also select on the columns with xs()
, by
providing the axis argument
In [32]: df = df.T
In [33]: df.xs('one', level='second', axis=1)
Out[33]:
first bar baz foo qux
A 0.895717 -1.206412 1.431256 -1.170299
B 0.410835 0.132003 -0.076467 1.130127
C -1.413681 1.024180 0.875906 0.974466
# using the slicers (new in 0.14.0)
In [34]: df.loc[:,(slice(None),'one')]
Out[34]:
first bar baz foo qux
second one one one one
A 0.895717 -1.206412 1.431256 -1.170299
B 0.410835 0.132003 -0.076467 1.130127
C -1.413681 1.024180 0.875906 0.974466
xs()
also allows selection with multiple keys
In [35]: df.xs(('one', 'bar'), level=('second', 'first'), axis=1)
Out[35]:
first bar
second one
A 0.895717
B 0.410835
C -1.413681
# using the slicers (new in 0.14.0)
In [36]: df.loc[:,('bar','one')]
Out[36]:
A 0.895717
B 0.410835
C -1.413681
Name: (bar, one), dtype: float64
New in version 0.13.0.
You can pass drop_level=False
to xs()
to retain
the level that was selected
In [37]: df.xs('one', level='second', axis=1, drop_level=False)
Out[37]:
first bar baz foo qux
second one one one one
A 0.895717 -1.206412 1.431256 -1.170299
B 0.410835 0.132003 -0.076467 1.130127
C -1.413681 1.024180 0.875906 0.974466
versus the result with drop_level=True
(the default value)
In [38]: df.xs('one', level='second', axis=1, drop_level=True)
Out[38]:
first bar baz foo qux
A 0.895717 -1.206412 1.431256 -1.170299
B 0.410835 0.132003 -0.076467 1.130127
C -1.413681 1.024180 0.875906 0.974466
3.2.3 Advanced reindexing and alignment
The parameter level
has been added to the reindex
and align
methods
of pandas objects. This is useful to broadcast values across a level. For
instance:
In [39]: midx = pd.MultiIndex(levels=[['zero', 'one'], ['x','y']],
....: labels=[[1,1,0,0],[1,0,1,0]])
....:
In [40]: df = pd.DataFrame(np.random.randn(4,2), index=midx)
In [41]: df
Out[41]:
0 1
one y 1.519970 -0.493662
x 0.600178 0.274230
zero y 0.132885 -0.023688
x 2.410179 1.450520
In [42]: df2 = df.mean(level=0)
In [43]: df2
Out[43]:
0 1
zero 1.271532 0.713416
one 1.060074 -0.109716
In [44]: df2.reindex(df.index, level=0)
Out[44]:
0 1
one y 1.060074 -0.109716
x 1.060074 -0.109716
zero y 1.271532 0.713416
x 1.271532 0.713416
# aligning
In [45]: df_aligned, df2_aligned = df.align(df2, level=0)
In [46]: df_aligned
Out[46]:
0 1
one y 1.519970 -0.493662
x 0.600178 0.274230
zero y 0.132885 -0.023688
x 2.410179 1.450520
In [47]: df2_aligned
Out[47]:
0 1
one y 1.060074 -0.109716
x 1.060074 -0.109716
zero y 1.271532 0.713416
x 1.271532 0.713416
3.2.4 Swapping levels with swaplevel()
The swaplevel
function can switch the order of two levels:
In [48]: df[:5]
Out[48]:
0 1
one y 1.519970 -0.493662
x 0.600178 0.274230
zero y 0.132885 -0.023688
x 2.410179 1.450520
In [49]: df[:5].swaplevel(0, 1, axis=0)
Out[49]:
0 1
y one 1.519970 -0.493662
x one 0.600178 0.274230
y zero 0.132885 -0.023688
x zero 2.410179 1.450520
3.2.5 Reordering levels with reorder_levels()
The reorder_levels
function generalizes the swaplevel
function,
allowing you to permute the hierarchical index levels in one step:
In [50]: df[:5].reorder_levels([1,0], axis=0)
Out[50]:
0 1
y one 1.519970 -0.493662
x one 0.600178 0.274230
y zero 0.132885 -0.023688
x zero 2.410179 1.450520