2.19 Index objects

The pandas Index class and its subclasses can be viewed as implementing an ordered multiset. Duplicates are allowed. However, if you try to convert an Index object with duplicate entries into a set, an exception will be raised.

Index also provides the infrastructure necessary for lookups, data alignment, and reindexing. The easiest way to create an Index directly is to pass a list or other sequence to Index:

In [1]: index = pd.Index(['e', 'd', 'a', 'b'])

In [2]: index
Out[2]: Index([u'e', u'd', u'a', u'b'], dtype='object')

In [3]: 'd' in index
Out[3]: True

You can also pass a name to be stored in the index:

In [4]: index = pd.Index(['e', 'd', 'a', 'b'], name='something')

In [5]: index
Out[5]: Index([u'e', u'd', u'a', u'b'], dtype='object', name=u'something')

In [6]: index.name
Out[6]: 'something'

The name, if set, will be shown in the console display:

In [7]: index = pd.Index(list(range(5)), name='rows')

In [8]: columns = pd.Index(['A', 'B', 'C'], name='cols')

In [9]: df = pd.DataFrame(np.random.randn(5, 3), index=index, columns=columns)

In [10]: df
Out[10]: 
cols       A       B       C
rows                        
0    -0.5146 -0.4496  1.7346
1     0.6434  0.0261  0.0804
2    -0.7974 -0.6281 -0.3462
3     0.9681  0.7056 -2.1567
4     0.9506  0.5382 -0.4508

In [11]: df['A']
Out[11]: 
rows
0   -0.5146
1    0.6434
2   -0.7974
3    0.9681
4    0.9506
Name: A, dtype: float64

2.19.1 Setting metadata

New in version 0.13.0.

Indexes are “mostly immutable”, but it is possible to set and change their metadata, like the index name (or, for MultiIndex, levels and labels).

You can use the rename, set_names, set_levels, and set_labels to set these attributes directly. They default to returning a copy; however, you can specify inplace=True to have the data change in place.

See Advanced Indexing for usage of MultiIndexes.

In [12]: ind = pd.Index([1, 2, 3])

In [13]: ind
Out[13]: Int64Index([1, 2, 3], dtype='int64')

In [14]: ind.rename("apple")
Out[14]: Int64Index([1, 2, 3], dtype='int64', name=u'apple')

In [15]: ind
Out[15]: Int64Index([1, 2, 3], dtype='int64')

In [16]: ind.set_names(["apple"], inplace=True)

In [17]: ind.name = "bob"

In [18]: ind
Out[18]: Int64Index([1, 2, 3], dtype='int64', name=u'bob')

New in version 0.15.0.

set_names, set_levels, and set_labels also take an optional level` argument

In [19]: index = pd.MultiIndex.from_product([range(3), ['one', 'two']], names=['first', 'second'])

In [20]: index
Out[20]: 
MultiIndex(levels=[[0, 1, 2], [u'one', u'two']],
           labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]],
           names=[u'first', u'second'])

In [21]: index.levels[1]
Out[21]: Index([u'one', u'two'], dtype='object', name=u'second')

In [22]: index.set_levels(["a", "b"], level=1)
Out[22]: 
MultiIndex(levels=[[0, 1, 2], [u'a', u'b']],
           labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]],
           names=[u'first', u'second'])

2.19.2 Set operations on Index objects

Warning

In 0.15.0. the set operations + and - were deprecated in order to provide these for numeric type operations on certain index types. + can be replace by .union() or |, and - by .difference().

The two main operations are union (|), intersection (&) These can be directly called as instance methods or used via overloaded operators. Difference is provided via the .difference() method.

In [23]: a = pd.Index(['c', 'b', 'a'])

In [24]: b = pd.Index(['c', 'e', 'd'])

In [25]: a
Out[25]: Index([u'c', u'b', u'a'], dtype='object')

In [26]: b
Out[26]: Index([u'c', u'e', u'd'], dtype='object')

In [27]: a | b
Out[27]: Index([u'a', u'b', u'c', u'd', u'e'], dtype='object')

In [28]: a & b
Out[28]: Index([u'c'], dtype='object')

In [29]: a.difference(b)
Out[29]: Index([u'a', u'b'], dtype='object')

Also available is the symmetric_difference (^) operation, which returns elements that appear in either idx1 or idx2 but not both. This is equivalent to the Index created by idx1.difference(idx2).union(idx2.difference(idx1)), with duplicates dropped.

In [30]: idx1 = pd.Index([1, 2, 3, 4])

In [31]: idx2 = pd.Index([2, 3, 4, 5])

In [32]: idx1.symmetric_difference(idx2)
Out[32]: Int64Index([1, 5], dtype='int64')

In [33]: idx1 ^ idx2
Out[33]: Int64Index([1, 5], dtype='int64')

2.19.3 Missing values

New in version 0.17.1.

Important

Even though Index can hold missing values (NaN), it should be avoided if you do not want any unexpected results. For example, some operations exclude missing values implicitly.

Index.fillna fills missing values with specified scalar value.

In [34]: idx1 = pd.Index([1, np.nan, 3, 4])

In [35]: idx1
Out[35]: Float64Index([1.0, nan, 3.0, 4.0], dtype='float64')

In [36]: idx1.fillna(2)
Out[36]: Float64Index([1.0, 2.0, 3.0, 4.0], dtype='float64')

In [37]: idx2 = pd.DatetimeIndex([pd.Timestamp('2011-01-01'), pd.NaT, pd.Timestamp('2011-01-03')])

In [38]: idx2
Out[38]: DatetimeIndex(['2011-01-01', 'NaT', '2011-01-03'], dtype='datetime64[ns]', freq=None)

In [39]: idx2.fillna(pd.Timestamp('2011-01-02'))
Out[39]: DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03'], dtype='datetime64[ns]', freq=None)