2.20 Set / Reset Index
Occasionally you will load or create a data set into a DataFrame and want to add an index after you’ve already done so. There are a couple of different ways.
2.20.1 Set an index
DataFrame has a set_index
method which takes a column name (for a regular
Index
) or a list of column names (for a MultiIndex
), to create a new,
indexed DataFrame:
In [1]: data = pd.DataFrame({'a' : ['bar', 'bar', 'foo', 'foo'],
...: 'b' : ['one', 'two', 'one', 'two'],
...: 'c' : ['z', 'y', 'x', 'w'],
...: 'd' : [1., 2., 3, 4]})
...:
In [2]: data
Out[2]:
a b c d
0 bar one z 1.0
1 bar two y 2.0
2 foo one x 3.0
3 foo two w 4.0
In [3]: indexed1 = data.set_index('c')
In [4]: indexed1
Out[4]:
a b d
c
z bar one 1.0
y bar two 2.0
x foo one 3.0
w foo two 4.0
In [5]: indexed2 = data.set_index(['a', 'b'])
In [6]: indexed2
Out[6]:
c d
a b
bar one z 1.0
two y 2.0
foo one x 3.0
two w 4.0
The append
keyword option allow you to keep the existing index and append
the given columns to a MultiIndex:
In [7]: frame = data.set_index('c', drop=False)
In [8]: frame
Out[8]:
a b c d
c
z bar one z 1.0
y bar two y 2.0
x foo one x 3.0
w foo two w 4.0
In [9]: frame = frame.set_index(['a', 'b'], append=True)
In [10]: frame
Out[10]:
c d
c a b
z bar one z 1.0
y bar two y 2.0
x foo one x 3.0
w foo two w 4.0
Other options in set_index
allow you not drop the index columns or to add
the index in-place (without creating a new object):
In [11]: data
Out[11]:
a b c d
0 bar one z 1.0
1 bar two y 2.0
2 foo one x 3.0
3 foo two w 4.0
In [12]: data.set_index('c', drop=False)
Out[12]:
a b c d
c
z bar one z 1.0
y bar two y 2.0
x foo one x 3.0
w foo two w 4.0
In [13]: data.set_index(['a', 'b'], inplace=True)
In [14]: data
Out[14]:
c d
a b
bar one z 1.0
two y 2.0
foo one x 3.0
two w 4.0
2.20.2 Reset the index
As a convenience, there is a new function on DataFrame called reset_index
which transfers the index values into the DataFrame’s columns and sets a simple
integer index. This is the inverse operation to set_index
In [15]: data
Out[15]:
c d
a b
bar one z 1.0
two y 2.0
foo one x 3.0
two w 4.0
In [16]: data.reset_index()
Out[16]:
a b c d
0 bar one z 1.0
1 bar two y 2.0
2 foo one x 3.0
3 foo two w 4.0
The output is more similar to a SQL table or a record array. The names for the
columns derived from the index are the ones stored in the names
attribute.
You can use the level
keyword to remove only a portion of the index:
In [17]: frame
Out[17]:
c d
c a b
z bar one z 1.0
y bar two y 2.0
x foo one x 3.0
w foo two w 4.0
In [18]: frame.reset_index(level=1)
Out[18]:
a c d
c b
z one bar z 1.0
y two bar y 2.0
x one foo x 3.0
w two foo w 4.0
reset_index
takes an optional parameter drop
which if true simply
discards the index, instead of putting index values in the DataFrame’s columns.
Note
The reset_index
method used to be called delevel
which is now
deprecated.
2.20.3 Adding an ad hoc index
If you create an index yourself, you can just assign it to the index
field:
data.index = index