2.3 MultiIndexing
The multindexing docs.
Creating a multi-index from a labeled frame
In [1]: df = pd.DataFrame({'row' : [0,1,2],
...: 'One_X' : [1.1,1.1,1.1],
...: 'One_Y' : [1.2,1.2,1.2],
...: 'Two_X' : [1.11,1.11,1.11],
...: 'Two_Y' : [1.22,1.22,1.22]}); df
...:
Out[1]:
One_X One_Y Two_X Two_Y row
0 1.1 1.2 1.11 1.22 0
1 1.1 1.2 1.11 1.22 1
2 1.1 1.2 1.11 1.22 2
# As Labelled Index
In [2]: df = df.set_index('row');df
Out[2]:
One_X One_Y Two_X Two_Y
row
0 1.1 1.2 1.11 1.22
1 1.1 1.2 1.11 1.22
2 1.1 1.2 1.11 1.22
# With Hierarchical Columns
In [3]: df.columns = pd.MultiIndex.from_tuples([tuple(c.split('_')) for c in df.columns]);df
Out[3]:
One Two
X Y X Y
row
0 1.1 1.2 1.11 1.22
1 1.1 1.2 1.11 1.22
2 1.1 1.2 1.11 1.22
# Now stack & Reset
In [4]: df = df.stack(0).reset_index(1);df
Out[4]:
level_1 X Y
row
0 One 1.10 1.20
0 Two 1.11 1.22
1 One 1.10 1.20
1 Two 1.11 1.22
2 One 1.10 1.20
2 Two 1.11 1.22
# And fix the labels (Notice the label 'level_1' got added automatically)
In [5]: df.columns = ['Sample','All_X','All_Y'];df
Out[5]:
Sample All_X All_Y
row
0 One 1.10 1.20
0 Two 1.11 1.22
1 One 1.10 1.20
1 Two 1.11 1.22
2 One 1.10 1.20
2 Two 1.11 1.22
2.3.1 Arithmetic
Performing arithmetic with a multi-index that needs broadcasting
In [6]: cols = pd.MultiIndex.from_tuples([ (x,y) for x in ['A','B','C'] for y in ['O','I']])
In [7]: df = pd.DataFrame(np.random.randn(2,6),index=['n','m'],columns=cols); df
Out[7]:
A B C
O I O I O I
n 0.469112 -0.282863 -1.509059 -1.135632 1.212112 -0.173215
m 0.119209 -1.044236 -0.861849 -2.104569 -0.494929 1.071804
In [8]: df = df.div(df['C'],level=1); df
Out[8]:
A B C
O I O I O I
n 0.387021 1.633022 -1.244983 6.556214 1.0 1.0
m -0.240860 -0.974279 1.741358 -1.963577 1.0 1.0
2.3.2 Slicing
In [9]: coords = [('AA','one'),('AA','six'),('BB','one'),('BB','two'),('BB','six')]
In [10]: index = pd.MultiIndex.from_tuples(coords)
In [11]: df = pd.DataFrame([11,22,33,44,55],index,['MyData']); df
Out[11]:
MyData
AA one 11
six 22
BB one 33
two 44
six 55
To take the cross section of the 1st level and 1st axis the index:
In [12]: df.xs('BB',level=0,axis=0) #Note : level and axis are optional, and default to zero
Out[12]:
MyData
one 33
two 44
six 55
...and now the 2nd level of the 1st axis.
In [13]: df.xs('six',level=1,axis=0)
Out[13]:
MyData
AA 22
BB 55
Slicing a multi-index with xs, method #2
In [14]: index = list(itertools.product(['Ada','Quinn','Violet'],['Comp','Math','Sci']))
In [15]: headr = list(itertools.product(['Exams','Labs'],['I','II']))
In [16]: indx = pd.MultiIndex.from_tuples(index,names=['Student','Course'])
In [17]: cols = pd.MultiIndex.from_tuples(headr) #Notice these are un-named
In [18]: data = [[70+x+y+(x*y)%3 for x in range(4)] for y in range(9)]
In [19]: df = pd.DataFrame(data,indx,cols); df
Out[19]:
Exams Labs
I II I II
Student Course
Ada Comp 70 71 72 73
Math 71 73 75 74
Sci 72 75 75 75
Quinn Comp 73 74 75 76
... ... .. ... ..
Sci 75 78 78 78
Violet Comp 76 77 78 79
Math 77 79 81 80
Sci 78 81 81 81
[9 rows x 4 columns]
In [20]: All = slice(None)
In [21]: df.loc['Violet']
Out[21]:
Exams Labs
I II I II
Course
Comp 76 77 78 79
Math 77 79 81 80
Sci 78 81 81 81
In [22]: df.loc[(All,'Math'),All]
Out[22]:
Exams Labs
I II I II
Student Course
Ada Math 71 73 75 74
Quinn Math 74 76 78 77
Violet Math 77 79 81 80
In [23]: df.loc[(slice('Ada','Quinn'),'Math'),All]
Out[23]:
Exams Labs
I II I II
Student Course
Ada Math 71 73 75 74
Quinn Math 74 76 78 77
In [24]: df.loc[(All,'Math'),('Exams')]
Out[24]:
I II
Student Course
Ada Math 71 73
Quinn Math 74 76
Violet Math 77 79
In [25]: df.loc[(All,'Math'),(All,'II')]
Out[25]:
Exams Labs
II II
Student Course
Ada Math 73 74
Quinn Math 76 77
Violet Math 79 80
2.3.3 Sorting
Sort by specific column or an ordered list of columns, with a multi-index
In [26]: df.sort_values(by=('Labs', 'II'), ascending=False)
Out[26]:
Exams Labs
I II I II
Student Course
Violet Sci 78 81 81 81
Math 77 79 81 80
Comp 76 77 78 79
Quinn Sci 75 78 78 78
... ... .. ... ..
Comp 73 74 75 76
Ada Sci 72 75 75 75
Math 71 73 75 74
Comp 70 71 72 73
[9 rows x 4 columns]