2.5 Selection By Label

Warning

Whether a copy or a reference is returned for a setting operation, may depend on the context. This is sometimes called chained assignment and should be avoided. See Returning a View versus Copy

Warning

.loc is strict when you present slicers that are not compatible (or convertible) with the index type. For example using integers in a DatetimeIndex. These will raise a TypeError.
In [1]: dfl = pd.DataFrame(np.random.randn(5,4), columns=list('ABCD'), index=pd.date_range('20130101',periods=5))

In [2]: dfl
Out[2]: 
                 A       B       C       D
2013-01-01  1.0758 -0.1090  1.6436 -1.4694
2013-01-02  0.3570 -0.6746 -1.7769 -0.9689
2013-01-03 -1.2945  0.4137  0.2767 -0.4720
2013-01-04 -0.0140 -0.3625 -0.0062 -0.9231
2013-01-05  0.8957  0.8052 -1.2064  2.5656
In [4]: dfl.loc[2:3]
TypeError: cannot do slice indexing on <class 'pandas.tseries.index.DatetimeIndex'> with these indexers [2] of <type 'int'>

String likes in slicing can be convertible to the type of the index and lead to natural slicing.

In [3]: dfl.loc['20130102':'20130104']
Out[3]: 
                 A       B       C       D
2013-01-02  0.3570 -0.6746 -1.7769 -0.9689
2013-01-03 -1.2945  0.4137  0.2767 -0.4720
2013-01-04 -0.0140 -0.3625 -0.0062 -0.9231

pandas provides a suite of methods in order to have purely label based indexing. This is a strict inclusion based protocol. At least 1 of the labels for which you ask, must be in the index or a KeyError will be raised! When slicing, the start bound is included, AND the stop bound is included. Integers are valid labels, but they refer to the label and not the position.

The .loc attribute is the primary access method. The following are valid inputs:

  • A single label, e.g. 5 or 'a', (note that 5 is interpreted as a label of the index. This use is not an integer position along the index)
  • A list or array of labels ['a', 'b', 'c']
  • A slice object with labels 'a':'f' (note that contrary to usual python slices, both the start and the stop are included!)
  • A boolean array
  • A callable, see Selection By Callable
In [4]: s1 = pd.Series(np.random.randn(6),index=list('abcdef'))

In [5]: s1
Out[5]: 
a    1.4313
b    1.3403
c   -1.1703
d   -0.2262
e    0.4108
f    0.8139
dtype: float64

In [6]: s1.loc['c':]
Out[6]: 
c   -1.1703
d   -0.2262
e    0.4108
f    0.8139
dtype: float64

In [7]: s1.loc['b']
Out[7]: 1.3403088497993827

Note that setting works as well:

In [8]: s1.loc['c':] = 0

In [9]: s1
Out[9]: 
a    1.4313
b    1.3403
c    0.0000
d    0.0000
e    0.0000
f    0.0000
dtype: float64

With a DataFrame

In [10]: df1 = pd.DataFrame(np.random.randn(6,4),
   ....:                    index=list('abcdef'),
   ....:                    columns=list('ABCD'))
   ....: 

In [11]: df1
Out[11]: 
        A       B       C       D
a  0.1320 -0.8273 -0.0765 -1.1877
b  1.1301 -1.4367 -1.4137  1.6079
c  1.0242  0.5696  0.8759 -2.2114
d  0.9745 -2.0067 -0.4100 -0.0786
e  0.5460 -1.2192 -1.2268  0.7698
f -1.2812 -0.7277 -0.1213 -0.0979

In [12]: df1.loc[['a', 'b', 'd'], :]
Out[12]: 
        A       B       C       D
a  0.1320 -0.8273 -0.0765 -1.1877
b  1.1301 -1.4367 -1.4137  1.6079
d  0.9745 -2.0067 -0.4100 -0.0786

Accessing via label slices

In [13]: df1.loc['d':, 'A':'C']
Out[13]: 
        A       B       C
d  0.9745 -2.0067 -0.4100
e  0.5460 -1.2192 -1.2268
f -1.2812 -0.7277 -0.1213

For getting a cross section using a label (equiv to df.xs('a'))

In [14]: df1.loc['a']
Out[14]: 
A    0.1320
B   -0.8273
C   -0.0765
D   -1.1877
Name: a, dtype: float64

For getting values with a boolean array

In [15]: df1.loc['a'] > 0
Out[15]: 
A     True
B    False
C    False
D    False
Name: a, dtype: bool

In [16]: df1.loc[:, df1.loc['a'] > 0]
Out[16]: 
        A
a  0.1320
b  1.1301
c  1.0242
d  0.9745
e  0.5460
f -1.2812

For getting a value explicitly (equiv to deprecated df.get_value('a','A'))

# this is also equivalent to ``df1.at['a','A']``
In [17]: df1.loc['a', 'A']
Out[17]: 0.13200317033032932