2.19 Index objects
The pandas Index
class and its subclasses can be viewed as
implementing an ordered multiset. Duplicates are allowed. However, if you try
to convert an Index
object with duplicate entries into a
set
, an exception will be raised.
Index
also provides the infrastructure necessary for
lookups, data alignment, and reindexing. The easiest way to create an
Index
directly is to pass a list
or other sequence to
Index
:
In [1]: index = pd.Index(['e', 'd', 'a', 'b'])
In [2]: index
Out[2]: Index([u'e', u'd', u'a', u'b'], dtype='object')
In [3]: 'd' in index
Out[3]: True
You can also pass a name
to be stored in the index:
In [4]: index = pd.Index(['e', 'd', 'a', 'b'], name='something')
In [5]: index
Out[5]: Index([u'e', u'd', u'a', u'b'], dtype='object', name=u'something')
In [6]: index.name
Out[6]: 'something'
The name, if set, will be shown in the console display:
In [7]: index = pd.Index(list(range(5)), name='rows')
In [8]: columns = pd.Index(['A', 'B', 'C'], name='cols')
In [9]: df = pd.DataFrame(np.random.randn(5, 3), index=index, columns=columns)
In [10]: df
Out[10]:
cols A B C
rows
0 -0.5146 -0.4496 1.7346
1 0.6434 0.0261 0.0804
2 -0.7974 -0.6281 -0.3462
3 0.9681 0.7056 -2.1567
4 0.9506 0.5382 -0.4508
In [11]: df['A']
Out[11]:
rows
0 -0.5146
1 0.6434
2 -0.7974
3 0.9681
4 0.9506
Name: A, dtype: float64
2.19.1 Setting metadata
New in version 0.13.0.
Indexes are “mostly immutable”, but it is possible to set and change their
metadata, like the index name
(or, for MultiIndex
, levels
and
labels
).
You can use the rename
, set_names
, set_levels
, and set_labels
to set these attributes directly. They default to returning a copy; however,
you can specify inplace=True
to have the data change in place.
See Advanced Indexing for usage of MultiIndexes.
In [12]: ind = pd.Index([1, 2, 3])
In [13]: ind
Out[13]: Int64Index([1, 2, 3], dtype='int64')
In [14]: ind.rename("apple")
Out[14]: Int64Index([1, 2, 3], dtype='int64', name=u'apple')
In [15]: ind
Out[15]: Int64Index([1, 2, 3], dtype='int64')
In [16]: ind.set_names(["apple"], inplace=True)
In [17]: ind.name = "bob"
In [18]: ind
Out[18]: Int64Index([1, 2, 3], dtype='int64', name=u'bob')
New in version 0.15.0.
set_names
, set_levels
, and set_labels
also take an optional
level` argument
In [19]: index = pd.MultiIndex.from_product([range(3), ['one', 'two']], names=['first', 'second'])
In [20]: index
Out[20]:
MultiIndex(levels=[[0, 1, 2], [u'one', u'two']],
labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]],
names=[u'first', u'second'])
In [21]: index.levels[1]
Out[21]: Index([u'one', u'two'], dtype='object', name=u'second')
In [22]: index.set_levels(["a", "b"], level=1)
Out[22]:
MultiIndex(levels=[[0, 1, 2], [u'a', u'b']],
labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]],
names=[u'first', u'second'])
2.19.2 Set operations on Index objects
Warning
In 0.15.0. the set operations +
and -
were deprecated in order to provide these for numeric type operations on certain
index types. +
can be replace by .union()
or |
, and -
by .difference()
.
The two main operations are union (|)
, intersection (&)
These can be directly called as instance methods or used via overloaded
operators. Difference is provided via the .difference()
method.
In [23]: a = pd.Index(['c', 'b', 'a'])
In [24]: b = pd.Index(['c', 'e', 'd'])
In [25]: a
Out[25]: Index([u'c', u'b', u'a'], dtype='object')
In [26]: b
Out[26]: Index([u'c', u'e', u'd'], dtype='object')
In [27]: a | b
Out[27]: Index([u'a', u'b', u'c', u'd', u'e'], dtype='object')
In [28]: a & b
Out[28]: Index([u'c'], dtype='object')
In [29]: a.difference(b)
Out[29]: Index([u'a', u'b'], dtype='object')
Also available is the symmetric_difference (^)
operation, which returns elements
that appear in either idx1
or idx2
but not both. This is
equivalent to the Index created by idx1.difference(idx2).union(idx2.difference(idx1))
,
with duplicates dropped.
In [30]: idx1 = pd.Index([1, 2, 3, 4])
In [31]: idx2 = pd.Index([2, 3, 4, 5])
In [32]: idx1.symmetric_difference(idx2)
Out[32]: Int64Index([1, 5], dtype='int64')
In [33]: idx1 ^ idx2
Out[33]: Int64Index([1, 5], dtype='int64')
2.19.3 Missing values
New in version 0.17.1.
Important
Even though Index
can hold missing values (NaN
), it should be avoided
if you do not want any unexpected results. For example, some operations
exclude missing values implicitly.
Index.fillna
fills missing values with specified scalar value.
In [34]: idx1 = pd.Index([1, np.nan, 3, 4])
In [35]: idx1
Out[35]: Float64Index([1.0, nan, 3.0, 4.0], dtype='float64')
In [36]: idx1.fillna(2)
Out[36]: Float64Index([1.0, 2.0, 3.0, 4.0], dtype='float64')
In [37]: idx2 = pd.DatetimeIndex([pd.Timestamp('2011-01-01'), pd.NaT, pd.Timestamp('2011-01-03')])
In [38]: idx2
Out[38]: DatetimeIndex(['2011-01-01', 'NaT', '2011-01-03'], dtype='datetime64[ns]', freq=None)
In [39]: idx2.fillna(pd.Timestamp('2011-01-02'))
Out[39]: DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03'], dtype='datetime64[ns]', freq=None)