2.6 Selection By Position
Warning
Whether a copy or a reference is returned for a setting operation, may depend on the context.
This is sometimes called chained assignment
and should be avoided.
See Returning a View versus Copy
pandas provides a suite of methods in order to get purely integer based indexing. The semantics follow closely python and numpy slicing. These are 0-based
indexing. When slicing, the start bounds is included, while the upper bound is excluded. Trying to use a non-integer, even a valid label will raise a IndexError
.
The .iloc
attribute is the primary access method. The following are valid inputs:
- An integer e.g.
5
- A list or array of integers
[4, 3, 0]
- A slice object with ints
1:7
- A boolean array
- A
callable
, see Selection By Callable
In [1]: s1 = pd.Series(np.random.randn(5), index=list(range(0,10,2)))
In [2]: s1
Out[2]:
0 1.0758
2 -0.1090
4 1.6436
6 -1.4694
8 0.3570
dtype: float64
In [3]: s1.iloc[:3]
Out[3]:
0 1.0758
2 -0.1090
4 1.6436
dtype: float64
In [4]: s1.iloc[3]
Out[4]: -1.4693879595399115
Note that setting works as well:
In [5]: s1.iloc[:3] = 0
In [6]: s1
Out[6]:
0 0.0000
2 0.0000
4 0.0000
6 -1.4694
8 0.3570
dtype: float64
With a DataFrame
In [7]: df1 = pd.DataFrame(np.random.randn(6,4),
...: index=list(range(0,12,2)),
...: columns=list(range(0,8,2)))
...:
In [8]: df1
Out[8]:
0 2 4 6
0 -0.6746 -1.7769 -0.9689 -1.2945
2 0.4137 0.2767 -0.4720 -0.0140
4 -0.3625 -0.0062 -0.9231 0.8957
6 0.8052 -1.2064 2.5656 1.4313
8 1.3403 -1.1703 -0.2262 0.4108
10 0.8139 0.1320 -0.8273 -0.0765
Select via integer slicing
In [9]: df1.iloc[:3]
Out[9]:
0 2 4 6
0 -0.6746 -1.7769 -0.9689 -1.2945
2 0.4137 0.2767 -0.4720 -0.0140
4 -0.3625 -0.0062 -0.9231 0.8957
In [10]: df1.iloc[1:5, 2:4]
Out[10]:
4 6
2 -0.4720 -0.0140
4 -0.9231 0.8957
6 2.5656 1.4313
8 -0.2262 0.4108
Select via integer list
In [11]: df1.iloc[[1, 3, 5], [1, 3]]
Out[11]:
2 6
2 0.2767 -0.0140
6 -1.2064 1.4313
10 0.1320 -0.0765
In [12]: df1.iloc[1:3, :]
Out[12]:
0 2 4 6
2 0.4137 0.2767 -0.4720 -0.0140
4 -0.3625 -0.0062 -0.9231 0.8957
In [13]: df1.iloc[:, 1:3]
Out[13]:
2 4
0 -1.7769 -0.9689
2 0.2767 -0.4720
4 -0.0062 -0.9231
6 -1.2064 2.5656
8 -1.1703 -0.2262
10 0.1320 -0.8273
# this is also equivalent to ``df1.iat[1,1]``
In [14]: df1.iloc[1, 1]
Out[14]: 0.27666171294975661
For getting a cross section using an integer position (equiv to df.xs(1)
)
In [15]: df1.iloc[1]
Out[15]:
0 0.4137
2 0.2767
4 -0.4720
6 -0.0140
Name: 2, dtype: float64
Out of range slice indexes are handled gracefully just as in Python/Numpy.
# these are allowed in python/numpy.
# Only works in Pandas starting from v0.14.0.
In [16]: x = list('abcdef')
In [17]: x
Out[17]: ['a', 'b', 'c', 'd', 'e', 'f']
In [18]: x[4:10]
Out[18]: ['e', 'f']
In [19]: x[8:10]
Out[19]: []
In [20]: s = pd.Series(x)
In [21]: s
Out[21]:
0 a
1 b
2 c
3 d
4 e
5 f
dtype: object
In [22]: s.iloc[4:10]
Out[22]:
4 e
5 f
dtype: object
In [23]: s.iloc[8:10]
Out[23]: Series([], dtype: object)
Note
Prior to v0.14.0, iloc
would not accept out of bounds indexers for
slices, e.g. a value that exceeds the length of the object being indexed.
Note that this could result in an empty axis (e.g. an empty DataFrame being returned)
In [24]: dfl = pd.DataFrame(np.random.randn(5,2), columns=list('AB'))
In [25]: dfl
Out[25]:
A B
0 -1.1877 1.1301
1 -1.4367 -1.4137
2 1.6079 1.0242
3 0.5696 0.8759
4 -2.2114 0.9745
In [26]: dfl.iloc[:, 2:3]
Out[26]:
Empty DataFrame
Columns: []
Index: [0, 1, 2, 3, 4]
In [27]: dfl.iloc[:, 1:3]
Out[27]:
B
0 1.1301
1 -1.4137
2 1.0242
3 0.8759
4 0.9745
In [28]: dfl.iloc[4:6]
Out[28]:
A B
4 -2.2114 0.9745
A single indexer that is out of bounds will raise an IndexError
.
A list of indexers where any element is out of bounds will raise an
IndexError
dfl.iloc[[4, 5, 6]]
IndexError: positional indexers are out-of-bounds
dfl.iloc[:, 4]
IndexError: single positional indexer is out-of-bounds