2.2 Basics
As mentioned when introducing the data structures in the last section, the primary function of indexing with [] (a.k.a. __getitem__
for those familiar with implementing class behavior in Python) is selecting out
lower-dimensional slices. Thus,
| Object Type | Selection | Return Value Type |
|---|---|---|
| Series | series[label] |
scalar value |
| DataFrame | frame[colname] |
Series corresponding to colname |
| Panel | panel[itemname] |
DataFrame corresponding to the itemname |
Here we construct a simple time series data set to use for illustrating the indexing functionality:
In [1]: dates = pd.date_range('1/1/2000', periods=8)
In [2]: df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D'])
In [3]: df
Out[3]:
A B C D
2000-01-01 0.4691 -0.2829 -1.5091 -1.1356
2000-01-02 1.2121 -0.1732 0.1192 -1.0442
2000-01-03 -0.8618 -2.1046 -0.4949 1.0718
2000-01-04 0.7216 -0.7068 -1.0396 0.2719
2000-01-05 -0.4250 0.5670 0.2762 -1.0874
2000-01-06 -0.6737 0.1136 -1.4784 0.5250
2000-01-07 0.4047 0.5770 -1.7150 -1.0393
2000-01-08 -0.3706 -1.1579 -1.3443 0.8449
In [4]: panel = pd.Panel({'one' : df, 'two' : df - df.mean()})
In [5]: panel
Out[5]:
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 8 (major_axis) x 4 (minor_axis)
Items axis: one to two
Major_axis axis: 2000-01-01 00:00:00 to 2000-01-08 00:00:00
Minor_axis axis: A to D
Note
None of the indexing functionality is time series specific unless specifically stated.
Thus, as per above, we have the most basic indexing using []:
In [6]: s = df['A']
In [7]: s
Out[7]:
2000-01-01 0.4691
2000-01-02 1.2121
2000-01-03 -0.8618
2000-01-04 0.7216
2000-01-05 -0.4250
2000-01-06 -0.6737
2000-01-07 0.4047
2000-01-08 -0.3706
Freq: D, Name: A, dtype: float64
In [8]: s[dates[5]]
Out[8]: -0.67368970808837059
In [9]: panel['two']
Out[9]:
A B C D
2000-01-01 0.4096 0.1131 -0.6108 -0.9365
2000-01-02 1.1526 0.2227 1.0174 -0.8451
2000-01-03 -0.9214 -1.7086 0.4033 1.2709
2000-01-04 0.6620 -0.3108 -0.1413 0.4710
2000-01-05 -0.4845 0.9630 1.1745 -0.8883
2000-01-06 -0.7332 0.5096 -0.5802 0.7241
2000-01-07 0.3452 0.9730 -0.8168 -0.8401
2000-01-08 -0.4302 -0.7619 -0.4461 1.0440
You can pass a list of columns to [] to select columns in that order.
If a column is not contained in the DataFrame, an exception will be
raised. Multiple columns can also be set in this manner:
In [10]: df
Out[10]:
A B C D
2000-01-01 0.4691 -0.2829 -1.5091 -1.1356
2000-01-02 1.2121 -0.1732 0.1192 -1.0442
2000-01-03 -0.8618 -2.1046 -0.4949 1.0718
2000-01-04 0.7216 -0.7068 -1.0396 0.2719
2000-01-05 -0.4250 0.5670 0.2762 -1.0874
2000-01-06 -0.6737 0.1136 -1.4784 0.5250
2000-01-07 0.4047 0.5770 -1.7150 -1.0393
2000-01-08 -0.3706 -1.1579 -1.3443 0.8449
In [11]: df[['B', 'A']] = df[['A', 'B']]
In [12]: df
Out[12]:
A B C D
2000-01-01 -0.2829 0.4691 -1.5091 -1.1356
2000-01-02 -0.1732 1.2121 0.1192 -1.0442
2000-01-03 -2.1046 -0.8618 -0.4949 1.0718
2000-01-04 -0.7068 0.7216 -1.0396 0.2719
2000-01-05 0.5670 -0.4250 0.2762 -1.0874
2000-01-06 0.1136 -0.6737 -1.4784 0.5250
2000-01-07 0.5770 0.4047 -1.7150 -1.0393
2000-01-08 -1.1579 -0.3706 -1.3443 0.8449
You may find this useful for applying a transform (in-place) to a subset of the columns.
Warning
pandas aligns all AXES when setting Series and DataFrame from .loc, .iloc and .ix.
This will not modify df because the column alignment is before value assignment.
In [13]: df[['A', 'B']]
Out[13]:
A B
2000-01-01 -0.2829 0.4691
2000-01-02 -0.1732 1.2121
2000-01-03 -2.1046 -0.8618
2000-01-04 -0.7068 0.7216
2000-01-05 0.5670 -0.4250
2000-01-06 0.1136 -0.6737
2000-01-07 0.5770 0.4047
2000-01-08 -1.1579 -0.3706
In [14]: df.loc[:,['B', 'A']] = df[['A', 'B']]
In [15]: df[['A', 'B']]
Out[15]:
A B
2000-01-01 -0.2829 0.4691
2000-01-02 -0.1732 1.2121
2000-01-03 -2.1046 -0.8618
2000-01-04 -0.7068 0.7216
2000-01-05 0.5670 -0.4250
2000-01-06 0.1136 -0.6737
2000-01-07 0.5770 0.4047
2000-01-08 -1.1579 -0.3706
The correct way is to use raw values
In [16]: df.loc[:,['B', 'A']] = df[['A', 'B']].values
In [17]: df[['A', 'B']]
Out[17]:
A B
2000-01-01 0.4691 -0.2829
2000-01-02 1.2121 -0.1732
2000-01-03 -0.8618 -2.1046
2000-01-04 0.7216 -0.7068
2000-01-05 -0.4250 0.5670
2000-01-06 -0.6737 0.1136
2000-01-07 0.4047 0.5770
2000-01-08 -0.3706 -1.1579