1.4 Label-based slicing conventions
1.4.1 Non-monotonic indexes require exact matches
If the index of a Series
or DataFrame
is monotonically increasing or decreasing, then the bounds
of a label-based slice can be outside the range of the index, much like slice indexing a
normal Python list
. Monotonicity of an index can be tested with the is_monotonic_increasing
and
is_monotonic_decreasing
attributes.
In [1]: df = pd.DataFrame(index=[2,3,3,4,5], columns=['data'], data=range(5))
In [2]: df.index.is_monotonic_increasing
Out[2]: True
# no rows 0 or 1, but still returns rows 2, 3 (both of them), and 4:
In [3]: df.loc[0:4, :]
Out[3]:
data
2 0
3 1
3 2
4 3
# slice is are outside the index, so empty DataFrame is returned
In [4]: df.loc[13:15, :]
Out[4]:
Empty DataFrame
Columns: [data]
Index: []
On the other hand, if the index is not monotonic, then both slice bounds must be unique members of the index.
In [5]: df = pd.DataFrame(index=[2,3,1,4,3,5], columns=['data'], data=range(6))
In [6]: df.index.is_monotonic_increasing
Out[6]: False
# OK because 2 and 4 are in the index
In [7]: df.loc[2:4, :]
Out[7]:
data
2 0
3 1
1 2
4 3
# 0 is not in the index
In [9]: df.loc[0:4, :]
KeyError: 0
# 3 is not a unique label
In [11]: df.loc[2:3, :]
KeyError: 'Cannot get right slice bound for non-unique label: 3'
1.4.2 Endpoints are inclusive
Compared with standard Python sequence slicing in which the slice endpoint is not inclusive, label-based slicing in pandas is inclusive. The primary reason for this is that it is often not possible to easily determine the “successor” or next element after a particular label in an index. For example, consider the following Series:
In [8]: s = pd.Series(np.random.randn(6), index=list('abcdef'))
In [9]: s
Out[9]:
a 1.5818
b 1.4930
c 0.4286
d 0.7753
e -0.3759
f -0.9626
dtype: float64
Suppose we wished to slice from c
to e
, using integers this would be
In [10]: s[2:5]
Out[10]:
c 0.4286
d 0.7753
e -0.3759
dtype: float64
However, if you only had c
and e
, determining the next element in the
index can be somewhat complicated. For example, the following does not work:
s.ix['c':'e'+1]
A very common use case is to limit a time series to start and end at two specific dates. To enable this, we made the design design to make label-based slicing include both endpoints:
In [11]: s.ix['c':'e']
Out[11]:
c 0.4286
d 0.7753
e -0.3759
dtype: float64
This is most definitely a “practicality beats purity” sort of thing, but it is something to watch out for if you expect label-based slicing to behave exactly in the way that standard Python integer slicing works.