2.5 Selection By Label
Warning
Whether a copy or a reference is returned for a setting operation, may depend on the context.
This is sometimes called chained assignment
and should be avoided.
See Returning a View versus Copy
Warning
.loc
is strict when you present slicers that are not compatible (or convertible) with the index type. For example using integers in aDatetimeIndex
. These will raise aTypeError
.
In [1]: dfl = pd.DataFrame(np.random.randn(5,4), columns=list('ABCD'), index=pd.date_range('20130101',periods=5))
In [2]: dfl
Out[2]:
A B C D
2013-01-01 1.0758 -0.1090 1.6436 -1.4694
2013-01-02 0.3570 -0.6746 -1.7769 -0.9689
2013-01-03 -1.2945 0.4137 0.2767 -0.4720
2013-01-04 -0.0140 -0.3625 -0.0062 -0.9231
2013-01-05 0.8957 0.8052 -1.2064 2.5656
In [4]: dfl.loc[2:3]
TypeError: cannot do slice indexing on <class 'pandas.tseries.index.DatetimeIndex'> with these indexers [2] of <type 'int'>
String likes in slicing can be convertible to the type of the index and lead to natural slicing.
In [3]: dfl.loc['20130102':'20130104']
Out[3]:
A B C D
2013-01-02 0.3570 -0.6746 -1.7769 -0.9689
2013-01-03 -1.2945 0.4137 0.2767 -0.4720
2013-01-04 -0.0140 -0.3625 -0.0062 -0.9231
pandas provides a suite of methods in order to have purely label based indexing. This is a strict inclusion based protocol.
At least 1 of the labels for which you ask, must be in the index or a KeyError
will be raised! When slicing, the start bound is included, AND the stop bound is included. Integers are valid labels, but they refer to the label and not the position.
The .loc
attribute is the primary access method. The following are valid inputs:
- A single label, e.g.
5
or'a'
, (note that5
is interpreted as a label of the index. This use is not an integer position along the index) - A list or array of labels
['a', 'b', 'c']
- A slice object with labels
'a':'f'
(note that contrary to usual python slices, both the start and the stop are included!) - A boolean array
- A
callable
, see Selection By Callable
In [4]: s1 = pd.Series(np.random.randn(6),index=list('abcdef'))
In [5]: s1
Out[5]:
a 1.4313
b 1.3403
c -1.1703
d -0.2262
e 0.4108
f 0.8139
dtype: float64
In [6]: s1.loc['c':]
Out[6]:
c -1.1703
d -0.2262
e 0.4108
f 0.8139
dtype: float64
In [7]: s1.loc['b']
Out[7]: 1.3403088497993827
Note that setting works as well:
In [8]: s1.loc['c':] = 0
In [9]: s1
Out[9]:
a 1.4313
b 1.3403
c 0.0000
d 0.0000
e 0.0000
f 0.0000
dtype: float64
With a DataFrame
In [10]: df1 = pd.DataFrame(np.random.randn(6,4),
....: index=list('abcdef'),
....: columns=list('ABCD'))
....:
In [11]: df1
Out[11]:
A B C D
a 0.1320 -0.8273 -0.0765 -1.1877
b 1.1301 -1.4367 -1.4137 1.6079
c 1.0242 0.5696 0.8759 -2.2114
d 0.9745 -2.0067 -0.4100 -0.0786
e 0.5460 -1.2192 -1.2268 0.7698
f -1.2812 -0.7277 -0.1213 -0.0979
In [12]: df1.loc[['a', 'b', 'd'], :]
Out[12]:
A B C D
a 0.1320 -0.8273 -0.0765 -1.1877
b 1.1301 -1.4367 -1.4137 1.6079
d 0.9745 -2.0067 -0.4100 -0.0786
Accessing via label slices
In [13]: df1.loc['d':, 'A':'C']
Out[13]:
A B C
d 0.9745 -2.0067 -0.4100
e 0.5460 -1.2192 -1.2268
f -1.2812 -0.7277 -0.1213
For getting a cross section using a label (equiv to df.xs('a')
)
In [14]: df1.loc['a']
Out[14]:
A 0.1320
B -0.8273
C -0.0765
D -1.1877
Name: a, dtype: float64
For getting values with a boolean array
In [15]: df1.loc['a'] > 0
Out[15]:
A True
B False
C False
D False
Name: a, dtype: bool
In [16]: df1.loc[:, df1.loc['a'] > 0]
Out[16]:
A
a 0.1320
b 1.1301
c 1.0242
d 0.9745
e 0.5460
f -1.2812
For getting a value explicitly (equiv to deprecated df.get_value('a','A')
)
# this is also equivalent to ``df1.at['a','A']``
In [17]: df1.loc['a', 'A']
Out[17]: 0.13200317033032932