3 Selection
Note
While standard Python / Numpy expressions for selecting and setting are
intuitive and come in handy for interactive work, for production code, we
recommend the optimized pandas data access methods, .at, .iat,
.loc, .iloc and .ix.
See the indexing documentation Indexing and Selecting Data and MultiIndex / Advanced Indexing
3.1 Getting
Selecting a single column, which yields a Series,
equivalent to df.A
In [1]: df
Out[1]:
A B C D
2013-01-01 0.469112 -0.282863 -1.509059 -1.135632
2013-01-02 1.212112 -0.173215 0.119209 -1.044236
2013-01-03 -0.861849 -2.104569 -0.494929 1.071804
2013-01-04 0.721555 -0.706771 -1.039575 0.271860
2013-01-05 -0.424972 0.567020 0.276232 -1.087401
2013-01-06 -0.673690 0.113648 -1.478427 0.524988
In [2]: df['A']
Out[2]:
2013-01-01 0.469112
2013-01-02 1.212112
2013-01-03 -0.861849
2013-01-04 0.721555
2013-01-05 -0.424972
2013-01-06 -0.673690
Freq: D, Name: A, dtype: float64
Selecting via [], which slices the rows.
In [3]: df[0:3]
Out[3]:
A B C D
2013-01-01 0.469112 -0.282863 -1.509059 -1.135632
2013-01-02 1.212112 -0.173215 0.119209 -1.044236
2013-01-03 -0.861849 -2.104569 -0.494929 1.071804
In [4]: df['20130102':'20130104']
Out[4]:
A B C D
2013-01-02 1.212112 -0.173215 0.119209 -1.044236
2013-01-03 -0.861849 -2.104569 -0.494929 1.071804
2013-01-04 0.721555 -0.706771 -1.039575 0.271860
3.2 Selection by Label
See more in Selection by Label
For getting a cross section using a label
In [5]: df.loc[dates[0]]
Out[5]:
A 0.469112
B -0.282863
C -1.509059
D -1.135632
Name: 2013-01-01 00:00:00, dtype: float64
Selecting on a multi-axis by label
In [6]: df.loc[:,['A','B']]
Out[6]:
A B
2013-01-01 0.469112 -0.282863
2013-01-02 1.212112 -0.173215
2013-01-03 -0.861849 -2.104569
2013-01-04 0.721555 -0.706771
2013-01-05 -0.424972 0.567020
2013-01-06 -0.673690 0.113648
Showing label slicing, both endpoints are included
In [7]: df.loc['20130102':'20130104',['A','B']]
Out[7]:
A B
2013-01-02 1.212112 -0.173215
2013-01-03 -0.861849 -2.104569
2013-01-04 0.721555 -0.706771
Reduction in the dimensions of the returned object
In [8]: df.loc['20130102',['A','B']]
Out[8]:
A 1.212112
B -0.173215
Name: 2013-01-02 00:00:00, dtype: float64
For getting a scalar value
In [9]: df.loc[dates[0],'A']
Out[9]: 0.46911229990718628
For getting fast access to a scalar (equiv to the prior method)
In [10]: df.at[dates[0],'A']
Out[10]: 0.46911229990718628
3.3 Selection by Position
See more in Selection by Position
Select via the position of the passed integers
In [11]: df.iloc[3]
Out[11]:
A 0.721555
B -0.706771
C -1.039575
D 0.271860
Name: 2013-01-04 00:00:00, dtype: float64
By integer slices, acting similar to numpy/python
In [12]: df.iloc[3:5,0:2]
Out[12]:
A B
2013-01-04 0.721555 -0.706771
2013-01-05 -0.424972 0.567020
By lists of integer position locations, similar to the numpy/python style
In [13]: df.iloc[[1,2,4],[0,2]]
Out[13]:
A C
2013-01-02 1.212112 0.119209
2013-01-03 -0.861849 -0.494929
2013-01-05 -0.424972 0.276232
For slicing rows explicitly
In [14]: df.iloc[1:3,:]
Out[14]:
A B C D
2013-01-02 1.212112 -0.173215 0.119209 -1.044236
2013-01-03 -0.861849 -2.104569 -0.494929 1.071804
For slicing columns explicitly
In [15]: df.iloc[:,1:3]
Out[15]:
B C
2013-01-01 -0.282863 -1.509059
2013-01-02 -0.173215 0.119209
2013-01-03 -2.104569 -0.494929
2013-01-04 -0.706771 -1.039575
2013-01-05 0.567020 0.276232
2013-01-06 0.113648 -1.478427
For getting a value explicitly
In [16]: df.iloc[1,1]
Out[16]: -0.17321464905330858
For getting fast access to a scalar (equiv to the prior method)
In [17]: df.iat[1,1]
Out[17]: -0.17321464905330858
3.4 Boolean Indexing
Using a single column’s values to select data.
In [18]: df[df.A > 0]
Out[18]:
A B C D
2013-01-01 0.469112 -0.282863 -1.509059 -1.135632
2013-01-02 1.212112 -0.173215 0.119209 -1.044236
2013-01-04 0.721555 -0.706771 -1.039575 0.271860
A where operation for getting.
In [19]: df[df > 0]
Out[19]:
A B C D
2013-01-01 0.469112 NaN NaN NaN
2013-01-02 1.212112 NaN 0.119209 NaN
2013-01-03 NaN NaN NaN 1.071804
2013-01-04 0.721555 NaN NaN 0.271860
2013-01-05 NaN 0.567020 0.276232 NaN
2013-01-06 NaN 0.113648 NaN 0.524988
Using the isin() method for filtering:
In [20]: df2 = df.copy()
In [21]: df2['E'] = ['one', 'one','two','three','four','three']
In [22]: df2
Out[22]:
A B C D E
2013-01-01 0.469112 -0.282863 -1.509059 -1.135632 one
2013-01-02 1.212112 -0.173215 0.119209 -1.044236 one
2013-01-03 -0.861849 -2.104569 -0.494929 1.071804 two
2013-01-04 0.721555 -0.706771 -1.039575 0.271860 three
2013-01-05 -0.424972 0.567020 0.276232 -1.087401 four
2013-01-06 -0.673690 0.113648 -1.478427 0.524988 three
In [23]: df2[df2['E'].isin(['two','four'])]
Out[23]:
A B C D E
2013-01-03 -0.861849 -2.104569 -0.494929 1.071804 two
2013-01-05 -0.424972 0.567020 0.276232 -1.087401 four
3.5 Setting
Setting a new column automatically aligns the data by the indexes
In [24]: s1 = pd.Series([1,2,3,4,5,6], index=pd.date_range('20130102', periods=6))
In [25]: s1
Out[25]:
2013-01-02 1
2013-01-03 2
2013-01-04 3
2013-01-05 4
2013-01-06 5
2013-01-07 6
Freq: D, dtype: int64
In [26]: df['F'] = s1
Setting values by label
In [27]: df.at[dates[0],'A'] = 0
Setting values by position
In [28]: df.iat[0,1] = 0
Setting by assigning with a numpy array
In [29]: df.loc[:,'D'] = np.array([5] * len(df))
The result of the prior setting operations
In [30]: df
Out[30]:
A B C D F
2013-01-01 0.000000 0.000000 -1.509059 5 NaN
2013-01-02 1.212112 -0.173215 0.119209 5 1.0
2013-01-03 -0.861849 -2.104569 -0.494929 5 2.0
2013-01-04 0.721555 -0.706771 -1.039575 5 3.0
2013-01-05 -0.424972 0.567020 0.276232 5 4.0
2013-01-06 -0.673690 0.113648 -1.478427 5 5.0
A where operation with setting.
In [31]: df2 = df.copy()
In [32]: df2[df2 > 0] = -df2
In [33]: df2
Out[33]:
A B C D F
2013-01-01 0.000000 0.000000 -1.509059 -5 NaN
2013-01-02 -1.212112 -0.173215 -0.119209 -5 -1.0
2013-01-03 -0.861849 -2.104569 -0.494929 -5 -2.0
2013-01-04 -0.721555 -0.706771 -1.039575 -5 -3.0
2013-01-05 -0.424972 -0.567020 -0.276232 -5 -4.0
2013-01-06 -0.673690 -0.113648 -1.478427 -5 -5.0