.. currentmodule:: pandas .. ipython:: python :suppress: import numpy as np np.random.seed(1234567) np.set_printoptions(precision=4, suppress=True) import pandas as pd pd.options.display.max_rows=8 .. _indexing.class: Index objects ------------- The pandas :class:`~pandas.Index` class and its subclasses can be viewed as implementing an *ordered multiset*. Duplicates are allowed. However, if you try to convert an :class:`~pandas.Index` object with duplicate entries into a ``set``, an exception will be raised. :class:`~pandas.Index` also provides the infrastructure necessary for lookups, data alignment, and reindexing. The easiest way to create an :class:`~pandas.Index` directly is to pass a ``list`` or other sequence to :class:`~pandas.Index`: .. ipython:: python index = pd.Index(['e', 'd', 'a', 'b']) index 'd' in index You can also pass a ``name`` to be stored in the index: .. ipython:: python index = pd.Index(['e', 'd', 'a', 'b'], name='something') index index.name The name, if set, will be shown in the console display: .. ipython:: python index = pd.Index(list(range(5)), name='rows') columns = pd.Index(['A', 'B', 'C'], name='cols') df = pd.DataFrame(np.random.randn(5, 3), index=index, columns=columns) df df['A'] .. _indexing.set_metadata: Setting metadata ~~~~~~~~~~~~~~~~ .. versionadded:: 0.13.0 Indexes are "mostly immutable", but it is possible to set and change their metadata, like the index ``name`` (or, for ``MultiIndex``, ``levels`` and ``labels``). You can use the ``rename``, ``set_names``, ``set_levels``, and ``set_labels`` to set these attributes directly. They default to returning a copy; however, you can specify ``inplace=True`` to have the data change in place. See :ref:`Advanced Indexing ` for usage of MultiIndexes. .. ipython:: python ind = pd.Index([1, 2, 3]) ind ind.rename("apple") ind ind.set_names(["apple"], inplace=True) ind.name = "bob" ind .. versionadded:: 0.15.0 ``set_names``, ``set_levels``, and ``set_labels`` also take an optional `level`` argument .. ipython:: python index = pd.MultiIndex.from_product([range(3), ['one', 'two']], names=['first', 'second']) index index.levels[1] index.set_levels(["a", "b"], level=1) Set operations on Index objects ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. _indexing.set_ops: .. warning:: In 0.15.0. the set operations ``+`` and ``-`` were deprecated in order to provide these for numeric type operations on certain index types. ``+`` can be replace by ``.union()`` or ``|``, and ``-`` by ``.difference()``. The two main operations are ``union (|)``, ``intersection (&)`` These can be directly called as instance methods or used via overloaded operators. Difference is provided via the ``.difference()`` method. .. ipython:: python a = pd.Index(['c', 'b', 'a']) b = pd.Index(['c', 'e', 'd']) a b a | b a & b a.difference(b) Also available is the ``symmetric_difference (^)`` operation, which returns elements that appear in either ``idx1`` or ``idx2`` but not both. This is equivalent to the Index created by ``idx1.difference(idx2).union(idx2.difference(idx1))``, with duplicates dropped. .. ipython:: python idx1 = pd.Index([1, 2, 3, 4]) idx2 = pd.Index([2, 3, 4, 5]) idx1.symmetric_difference(idx2) idx1 ^ idx2 Missing values ~~~~~~~~~~~~~~ .. _indexing.missing: .. versionadded:: 0.17.1 .. important:: Even though ``Index`` can hold missing values (``NaN``), it should be avoided if you do not want any unexpected results. For example, some operations exclude missing values implicitly. ``Index.fillna`` fills missing values with specified scalar value. .. ipython:: python idx1 = pd.Index([1, np.nan, 3, 4]) idx1 idx1.fillna(2) idx2 = pd.DatetimeIndex([pd.Timestamp('2011-01-01'), pd.NaT, pd.Timestamp('2011-01-03')]) idx2 idx2.fillna(pd.Timestamp('2011-01-02'))