1.4 Label-based slicing conventions

1.4.1 Non-monotonic indexes require exact matches

If the index of a Series or DataFrame is monotonically increasing or decreasing, then the bounds of a label-based slice can be outside the range of the index, much like slice indexing a normal Python list. Monotonicity of an index can be tested with the is_monotonic_increasing and is_monotonic_decreasing attributes.

In [1]: df = pd.DataFrame(index=[2,3,3,4,5], columns=['data'], data=range(5))

In [2]: df.index.is_monotonic_increasing
Out[2]: True

# no rows 0 or 1, but still returns rows 2, 3 (both of them), and 4:
In [3]: df.loc[0:4, :]
Out[3]: 
   data
2     0
3     1
3     2
4     3

# slice is are outside the index, so empty DataFrame is returned
In [4]: df.loc[13:15, :]
Out[4]: 
Empty DataFrame
Columns: [data]
Index: []

On the other hand, if the index is not monotonic, then both slice bounds must be unique members of the index.

In [5]: df = pd.DataFrame(index=[2,3,1,4,3,5], columns=['data'], data=range(6))

In [6]: df.index.is_monotonic_increasing
Out[6]: False

# OK because 2 and 4 are in the index
In [7]: df.loc[2:4, :]
Out[7]: 
   data
2     0
3     1
1     2
4     3
# 0 is not in the index
In [9]: df.loc[0:4, :]
KeyError: 0

# 3 is not a unique label
In [11]: df.loc[2:3, :]
KeyError: 'Cannot get right slice bound for non-unique label: 3'

1.4.2 Endpoints are inclusive

Compared with standard Python sequence slicing in which the slice endpoint is not inclusive, label-based slicing in pandas is inclusive. The primary reason for this is that it is often not possible to easily determine the “successor” or next element after a particular label in an index. For example, consider the following Series:

In [8]: s = pd.Series(np.random.randn(6), index=list('abcdef'))

In [9]: s
Out[9]: 
a    1.5818
b    1.4930
c    0.4286
d    0.7753
e   -0.3759
f   -0.9626
dtype: float64

Suppose we wished to slice from c to e, using integers this would be

In [10]: s[2:5]
Out[10]: 
c    0.4286
d    0.7753
e   -0.3759
dtype: float64

However, if you only had c and e, determining the next element in the index can be somewhat complicated. For example, the following does not work:

s.ix['c':'e'+1]

A very common use case is to limit a time series to start and end at two specific dates. To enable this, we made the design design to make label-based slicing include both endpoints:

In [11]: s.ix['c':'e']
Out[11]: 
c    0.4286
d    0.7753
e   -0.3759
dtype: float64

This is most definitely a “practicality beats purity” sort of thing, but it is something to watch out for if you expect label-based slicing to behave exactly in the way that standard Python integer slicing works.