2 Indexing and Selecting Data
The axis labeling information in pandas objects serves many purposes:
- Identifies data (i.e. provides metadata) using known indicators, important for analysis, visualization, and interactive console display
- Enables automatic and explicit data alignment
- Allows intuitive getting and setting of subsets of the data set
In this section, we will focus on the final point: namely, how to slice, dice,
and generally get and set subsets of pandas objects. The primary focus will be
on Series and DataFrame as they have received more development attention in
this area. Expect more work to be invested in higher-dimensional data
structures (including Panel
) in the future, especially in label-based
advanced indexing.
- 2.1 Different Choices for Indexing
- 2.2 Basics
- 2.3 Attribute Access
- 2.4 Slicing ranges
- 2.5 Selection By Label
- 2.6 Selection By Position
- 2.7 Selection By Callable
- 2.8 Selecting Random Samples
- 2.9 Setting With Enlargement
- 2.10 Fast scalar value getting and setting
- 2.11 Boolean indexing
- 2.12 Indexing with isin
- 2.13 The
where()
Method and Masking - 2.14 The
query()
Method (Experimental) - 2.15 Duplicate Data
- 2.16 Dictionary-like
get()
method - 2.17 The
select()
Method - 2.18 The
lookup()
Method - 2.19 Index objects
- 2.20 Set / Reset Index
- 2.21 Returning a view versus a copy
Note
The Python and NumPy indexing operators []
and attribute operator .
provide quick and easy access to pandas data structures across a wide range
of use cases. This makes interactive work intuitive, as there’s little new
to learn if you already know how to deal with Python dictionaries and NumPy
arrays. However, since the type of the data to be accessed isn’t known in
advance, directly using standard operators has some optimization limits. For
production code, we recommended that you take advantage of the optimized
pandas data access methods exposed in this chapter.
Warning
Whether a copy or a reference is returned for a setting operation, may
depend on the context. This is sometimes called chained assignment
and
should be avoided. See Returning a View versus Copy
Warning
In 0.15.0 Index
has internally been refactored to no longer subclass ndarray
but instead subclass PandasObject
, similarly to the rest of the pandas objects. This should be
a transparent change with only very limited API implications (See the Internal Refactoring)
Warning
Indexing on an integer-based Index with floats has been clarified in 0.18.0, for a summary of the changes, see here.
See the MultiIndex / Advanced Indexing for MultiIndex
and more advanced indexing documentation.
See the cookbook for some advanced strategies