2.2 Basics

As mentioned when introducing the data structures in the last section, the primary function of indexing with [] (a.k.a. __getitem__ for those familiar with implementing class behavior in Python) is selecting out lower-dimensional slices. Thus,

Object Type Selection Return Value Type
Series series[label] scalar value
DataFrame frame[colname] Series corresponding to colname
Panel panel[itemname] DataFrame corresponding to the itemname

Here we construct a simple time series data set to use for illustrating the indexing functionality:

In [1]: dates = pd.date_range('1/1/2000', periods=8)

In [2]: df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D'])

In [3]: df
Out[3]: 
                 A       B       C       D
2000-01-01  0.4691 -0.2829 -1.5091 -1.1356
2000-01-02  1.2121 -0.1732  0.1192 -1.0442
2000-01-03 -0.8618 -2.1046 -0.4949  1.0718
2000-01-04  0.7216 -0.7068 -1.0396  0.2719
2000-01-05 -0.4250  0.5670  0.2762 -1.0874
2000-01-06 -0.6737  0.1136 -1.4784  0.5250
2000-01-07  0.4047  0.5770 -1.7150 -1.0393
2000-01-08 -0.3706 -1.1579 -1.3443  0.8449

In [4]: panel = pd.Panel({'one' : df, 'two' : df - df.mean()})

In [5]: panel
Out[5]: 
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 8 (major_axis) x 4 (minor_axis)
Items axis: one to two
Major_axis axis: 2000-01-01 00:00:00 to 2000-01-08 00:00:00
Minor_axis axis: A to D

Note

None of the indexing functionality is time series specific unless specifically stated.

Thus, as per above, we have the most basic indexing using []:

In [6]: s = df['A']

In [7]: s
Out[7]: 
2000-01-01    0.4691
2000-01-02    1.2121
2000-01-03   -0.8618
2000-01-04    0.7216
2000-01-05   -0.4250
2000-01-06   -0.6737
2000-01-07    0.4047
2000-01-08   -0.3706
Freq: D, Name: A, dtype: float64

In [8]: s[dates[5]]
Out[8]: -0.67368970808837059

In [9]: panel['two']
Out[9]: 
                 A       B       C       D
2000-01-01  0.4096  0.1131 -0.6108 -0.9365
2000-01-02  1.1526  0.2227  1.0174 -0.8451
2000-01-03 -0.9214 -1.7086  0.4033  1.2709
2000-01-04  0.6620 -0.3108 -0.1413  0.4710
2000-01-05 -0.4845  0.9630  1.1745 -0.8883
2000-01-06 -0.7332  0.5096 -0.5802  0.7241
2000-01-07  0.3452  0.9730 -0.8168 -0.8401
2000-01-08 -0.4302 -0.7619 -0.4461  1.0440

You can pass a list of columns to [] to select columns in that order. If a column is not contained in the DataFrame, an exception will be raised. Multiple columns can also be set in this manner:

In [10]: df
Out[10]: 
                 A       B       C       D
2000-01-01  0.4691 -0.2829 -1.5091 -1.1356
2000-01-02  1.2121 -0.1732  0.1192 -1.0442
2000-01-03 -0.8618 -2.1046 -0.4949  1.0718
2000-01-04  0.7216 -0.7068 -1.0396  0.2719
2000-01-05 -0.4250  0.5670  0.2762 -1.0874
2000-01-06 -0.6737  0.1136 -1.4784  0.5250
2000-01-07  0.4047  0.5770 -1.7150 -1.0393
2000-01-08 -0.3706 -1.1579 -1.3443  0.8449

In [11]: df[['B', 'A']] = df[['A', 'B']]

In [12]: df
Out[12]: 
                 A       B       C       D
2000-01-01 -0.2829  0.4691 -1.5091 -1.1356
2000-01-02 -0.1732  1.2121  0.1192 -1.0442
2000-01-03 -2.1046 -0.8618 -0.4949  1.0718
2000-01-04 -0.7068  0.7216 -1.0396  0.2719
2000-01-05  0.5670 -0.4250  0.2762 -1.0874
2000-01-06  0.1136 -0.6737 -1.4784  0.5250
2000-01-07  0.5770  0.4047 -1.7150 -1.0393
2000-01-08 -1.1579 -0.3706 -1.3443  0.8449

You may find this useful for applying a transform (in-place) to a subset of the columns.

Warning

pandas aligns all AXES when setting Series and DataFrame from .loc, .iloc and .ix.

This will not modify df because the column alignment is before value assignment.

In [13]: df[['A', 'B']]
Out[13]: 
                 A       B
2000-01-01 -0.2829  0.4691
2000-01-02 -0.1732  1.2121
2000-01-03 -2.1046 -0.8618
2000-01-04 -0.7068  0.7216
2000-01-05  0.5670 -0.4250
2000-01-06  0.1136 -0.6737
2000-01-07  0.5770  0.4047
2000-01-08 -1.1579 -0.3706

In [14]: df.loc[:,['B', 'A']] = df[['A', 'B']]

In [15]: df[['A', 'B']]
Out[15]: 
                 A       B
2000-01-01 -0.2829  0.4691
2000-01-02 -0.1732  1.2121
2000-01-03 -2.1046 -0.8618
2000-01-04 -0.7068  0.7216
2000-01-05  0.5670 -0.4250
2000-01-06  0.1136 -0.6737
2000-01-07  0.5770  0.4047
2000-01-08 -1.1579 -0.3706

The correct way is to use raw values

In [16]: df.loc[:,['B', 'A']] = df[['A', 'B']].values

In [17]: df[['A', 'B']]
Out[17]: 
                 A       B
2000-01-01  0.4691 -0.2829
2000-01-02  1.2121 -0.1732
2000-01-03 -0.8618 -2.1046
2000-01-04  0.7216 -0.7068
2000-01-05 -0.4250  0.5670
2000-01-06 -0.6737  0.1136
2000-01-07  0.4047  0.5770
2000-01-08 -0.3706 -1.1579