2.2 Basics
As mentioned when introducing the data structures in the last section, the primary function of indexing with []
(a.k.a. __getitem__
for those familiar with implementing class behavior in Python) is selecting out
lower-dimensional slices. Thus,
Object Type | Selection | Return Value Type |
---|---|---|
Series | series[label] |
scalar value |
DataFrame | frame[colname] |
Series corresponding to colname |
Panel | panel[itemname] |
DataFrame corresponding to the itemname |
Here we construct a simple time series data set to use for illustrating the indexing functionality:
In [1]: dates = pd.date_range('1/1/2000', periods=8)
In [2]: df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D'])
In [3]: df
Out[3]:
A B C D
2000-01-01 0.4691 -0.2829 -1.5091 -1.1356
2000-01-02 1.2121 -0.1732 0.1192 -1.0442
2000-01-03 -0.8618 -2.1046 -0.4949 1.0718
2000-01-04 0.7216 -0.7068 -1.0396 0.2719
2000-01-05 -0.4250 0.5670 0.2762 -1.0874
2000-01-06 -0.6737 0.1136 -1.4784 0.5250
2000-01-07 0.4047 0.5770 -1.7150 -1.0393
2000-01-08 -0.3706 -1.1579 -1.3443 0.8449
In [4]: panel = pd.Panel({'one' : df, 'two' : df - df.mean()})
In [5]: panel
Out[5]:
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 8 (major_axis) x 4 (minor_axis)
Items axis: one to two
Major_axis axis: 2000-01-01 00:00:00 to 2000-01-08 00:00:00
Minor_axis axis: A to D
Note
None of the indexing functionality is time series specific unless specifically stated.
Thus, as per above, we have the most basic indexing using []
:
In [6]: s = df['A']
In [7]: s
Out[7]:
2000-01-01 0.4691
2000-01-02 1.2121
2000-01-03 -0.8618
2000-01-04 0.7216
2000-01-05 -0.4250
2000-01-06 -0.6737
2000-01-07 0.4047
2000-01-08 -0.3706
Freq: D, Name: A, dtype: float64
In [8]: s[dates[5]]
Out[8]: -0.67368970808837059
In [9]: panel['two']
Out[9]:
A B C D
2000-01-01 0.4096 0.1131 -0.6108 -0.9365
2000-01-02 1.1526 0.2227 1.0174 -0.8451
2000-01-03 -0.9214 -1.7086 0.4033 1.2709
2000-01-04 0.6620 -0.3108 -0.1413 0.4710
2000-01-05 -0.4845 0.9630 1.1745 -0.8883
2000-01-06 -0.7332 0.5096 -0.5802 0.7241
2000-01-07 0.3452 0.9730 -0.8168 -0.8401
2000-01-08 -0.4302 -0.7619 -0.4461 1.0440
You can pass a list of columns to []
to select columns in that order.
If a column is not contained in the DataFrame, an exception will be
raised. Multiple columns can also be set in this manner:
In [10]: df
Out[10]:
A B C D
2000-01-01 0.4691 -0.2829 -1.5091 -1.1356
2000-01-02 1.2121 -0.1732 0.1192 -1.0442
2000-01-03 -0.8618 -2.1046 -0.4949 1.0718
2000-01-04 0.7216 -0.7068 -1.0396 0.2719
2000-01-05 -0.4250 0.5670 0.2762 -1.0874
2000-01-06 -0.6737 0.1136 -1.4784 0.5250
2000-01-07 0.4047 0.5770 -1.7150 -1.0393
2000-01-08 -0.3706 -1.1579 -1.3443 0.8449
In [11]: df[['B', 'A']] = df[['A', 'B']]
In [12]: df
Out[12]:
A B C D
2000-01-01 -0.2829 0.4691 -1.5091 -1.1356
2000-01-02 -0.1732 1.2121 0.1192 -1.0442
2000-01-03 -2.1046 -0.8618 -0.4949 1.0718
2000-01-04 -0.7068 0.7216 -1.0396 0.2719
2000-01-05 0.5670 -0.4250 0.2762 -1.0874
2000-01-06 0.1136 -0.6737 -1.4784 0.5250
2000-01-07 0.5770 0.4047 -1.7150 -1.0393
2000-01-08 -1.1579 -0.3706 -1.3443 0.8449
You may find this useful for applying a transform (in-place) to a subset of the columns.
Warning
pandas aligns all AXES when setting Series
and DataFrame
from .loc
, .iloc
and .ix
.
This will not modify df
because the column alignment is before value assignment.
In [13]: df[['A', 'B']]
Out[13]:
A B
2000-01-01 -0.2829 0.4691
2000-01-02 -0.1732 1.2121
2000-01-03 -2.1046 -0.8618
2000-01-04 -0.7068 0.7216
2000-01-05 0.5670 -0.4250
2000-01-06 0.1136 -0.6737
2000-01-07 0.5770 0.4047
2000-01-08 -1.1579 -0.3706
In [14]: df.loc[:,['B', 'A']] = df[['A', 'B']]
In [15]: df[['A', 'B']]
Out[15]:
A B
2000-01-01 -0.2829 0.4691
2000-01-02 -0.1732 1.2121
2000-01-03 -2.1046 -0.8618
2000-01-04 -0.7068 0.7216
2000-01-05 0.5670 -0.4250
2000-01-06 0.1136 -0.6737
2000-01-07 0.5770 0.4047
2000-01-08 -1.1579 -0.3706
The correct way is to use raw values
In [16]: df.loc[:,['B', 'A']] = df[['A', 'B']].values
In [17]: df[['A', 'B']]
Out[17]:
A B
2000-01-01 0.4691 -0.2829
2000-01-02 1.2121 -0.1732
2000-01-03 -0.8618 -2.1046
2000-01-04 0.7216 -0.7068
2000-01-05 -0.4250 0.5670
2000-01-06 -0.6737 0.1136
2000-01-07 0.4047 0.5770
2000-01-08 -0.3706 -1.1579