7 DatetimeIndex
One of the main uses for DatetimeIndex
is as an index for pandas objects.
The DatetimeIndex
class contains many timeseries related optimizations:
- A large range of dates for various offsets are pre-computed and cached under the hood in order to make generating subsequent date ranges very fast (just have to grab a slice)
- Fast shifting using the
shift
andtshift
method on pandas objects- Unioning of overlapping DatetimeIndex objects with the same frequency is very fast (important for fast data alignment)
- Quick access to date fields via properties such as
year
,month
, etc.- Regularization functions like
snap
and very fastasof
logic
DatetimeIndex objects has all the basic functionality of regular Index objects and a smorgasbord of advanced timeseries-specific methods for easy frequency processing.
See also
Note
While pandas does not force you to have a sorted date index, some of these methods may have unexpected or incorrect behavior if the dates are unsorted. So please be careful.
DatetimeIndex
can be used like a regular index and offers all of its
intelligent functionality like selection, slicing, etc.
In [1]: rng = pd.date_range(start, end, freq='BM')
In [2]: ts = pd.Series(np.random.randn(len(rng)), index=rng)
In [3]: ts.index
Out[3]:
DatetimeIndex(['2011-01-31', '2011-02-28', '2011-03-31', '2011-04-29',
'2011-05-31', '2011-06-30', '2011-07-29', '2011-08-31',
'2011-09-30', '2011-10-31', '2011-11-30', '2011-12-30'],
dtype='datetime64[ns]', freq='BM')
In [4]: ts[:5].index
Out[4]:
DatetimeIndex(['2011-01-31', '2011-02-28', '2011-03-31', '2011-04-29',
'2011-05-31'],
dtype='datetime64[ns]', freq='BM')
In [5]: ts[::2].index
Out[5]:
DatetimeIndex(['2011-01-31', '2011-03-31', '2011-05-31', '2011-07-29',
'2011-09-30', '2011-11-30'],
dtype='datetime64[ns]', freq='2BM')
7.1 DatetimeIndex Partial String Indexing
You can pass in dates and strings that parse to dates as indexing parameters:
In [6]: ts['1/31/2011']
Out[6]: 0.46911229990718628
In [7]: ts[datetime(2011, 12, 25):]
Out[7]:
2011-12-30 1.071804
Freq: BM, dtype: float64
In [8]: ts['10/31/2011':'12/31/2011']
Out[8]:
2011-10-31 -2.104569
2011-11-30 -0.494929
2011-12-30 1.071804
Freq: BM, dtype: float64
To provide convenience for accessing longer time series, you can also pass in the year or year and month as strings:
In [9]: ts['2011']
Out[9]:
2011-01-31 0.469112
2011-02-28 -0.282863
2011-03-31 -1.509059
2011-04-29 -1.135632
...
2011-09-30 -0.861849
2011-10-31 -2.104569
2011-11-30 -0.494929
2011-12-30 1.071804
Freq: BM, dtype: float64
In [10]: ts['2011-6']
Out[10]:
2011-06-30 -0.173215
Freq: BM, dtype: float64
This type of slicing will work on a DataFrame with a DateTimeIndex
as well. Since the
partial string selection is a form of label slicing, the endpoints will be included. This
would include matching times on an included date. Here’s an example:
In [11]: dft = pd.DataFrame(randn(100000,1),
....: columns=['A'],
....: index=pd.date_range('20130101',periods=100000,freq='T'))
....:
In [12]: dft
Out[12]:
A
2013-01-01 00:00:00 0.721555
2013-01-01 00:01:00 -0.706771
2013-01-01 00:02:00 -1.039575
2013-01-01 00:03:00 0.271860
... ...
2013-03-11 10:36:00 -0.777369
2013-03-11 10:37:00 0.293270
2013-03-11 10:38:00 1.076066
2013-03-11 10:39:00 -1.562855
[100000 rows x 1 columns]
In [13]: dft['2013']
Out[13]:
A
2013-01-01 00:00:00 0.721555
2013-01-01 00:01:00 -0.706771
2013-01-01 00:02:00 -1.039575
2013-01-01 00:03:00 0.271860
... ...
2013-03-11 10:36:00 -0.777369
2013-03-11 10:37:00 0.293270
2013-03-11 10:38:00 1.076066
2013-03-11 10:39:00 -1.562855
[100000 rows x 1 columns]
This starts on the very first time in the month, and includes the last date & time for the month
In [14]: dft['2013-1':'2013-2']
Out[14]:
A
2013-01-01 00:00:00 0.721555
2013-01-01 00:01:00 -0.706771
2013-01-01 00:02:00 -1.039575
2013-01-01 00:03:00 0.271860
... ...
2013-02-28 23:56:00 -0.969105
2013-02-28 23:57:00 0.367078
2013-02-28 23:58:00 0.069096
2013-02-28 23:59:00 0.096317
[84960 rows x 1 columns]
This specifies a stop time that includes all of the times on the last day
In [15]: dft['2013-1':'2013-2-28']
Out[15]:
A
2013-01-01 00:00:00 0.721555
2013-01-01 00:01:00 -0.706771
2013-01-01 00:02:00 -1.039575
2013-01-01 00:03:00 0.271860
... ...
2013-02-28 23:56:00 -0.969105
2013-02-28 23:57:00 0.367078
2013-02-28 23:58:00 0.069096
2013-02-28 23:59:00 0.096317
[84960 rows x 1 columns]
This specifies an exact stop time (and is not the same as the above)
In [16]: dft['2013-1':'2013-2-28 00:00:00']
Out[16]:
A
2013-01-01 00:00:00 0.721555
2013-01-01 00:01:00 -0.706771
2013-01-01 00:02:00 -1.039575
2013-01-01 00:03:00 0.271860
... ...
2013-02-27 23:57:00 -0.102232
2013-02-27 23:58:00 -0.186728
2013-02-27 23:59:00 0.694242
2013-02-28 00:00:00 1.604295
[83521 rows x 1 columns]
We are stopping on the included end-point as it is part of the index
In [17]: dft['2013-1-15':'2013-1-15 12:30:00']
Out[17]:
A
2013-01-15 00:00:00 0.276491
2013-01-15 00:01:00 -0.286093
2013-01-15 00:02:00 -0.457773
2013-01-15 00:03:00 0.133433
... ...
2013-01-15 12:27:00 1.327543
2013-01-15 12:28:00 -0.160075
2013-01-15 12:29:00 1.295527
2013-01-15 12:30:00 -0.850073
[751 rows x 1 columns]
Warning
The following selection will raise a KeyError
; otherwise this selection methodology
would be inconsistent with other selection methods in pandas (as this is not a slice, nor does it
resolve to one)
dft['2013-1-15 12:30:00']
To select a single row, use .loc
In [18]: dft.loc['2013-1-15 12:30:00']
Out[18]:
A -0.850073
Name: 2013-01-15 12:30:00, dtype: float64
New in version 0.18.0.
DatetimeIndex Partial String Indexing also works on DataFrames with a MultiIndex
. For example:
In [19]: dft2 = pd.DataFrame(np.random.randn(20, 1),
....: columns=['A'],
....: index=pd.MultiIndex.from_product([pd.date_range('20130101',
....: periods=10,
....: freq='12H'),
....: ['a', 'b']]))
....:
In [20]: dft2
Out[20]:
A
2013-01-01 00:00:00 a -0.776320
b -0.747967
2013-01-01 12:00:00 a -0.034523
b -0.201754
... ...
2013-01-05 00:00:00 a 0.488744
b 0.550788
2013-01-05 12:00:00 a 0.514308
b 0.054993
[20 rows x 1 columns]
In [21]: dft2.loc['2013-01-05']
Out[21]:
A
2013-01-05 00:00:00 a 0.488744
b 0.550788
2013-01-05 12:00:00 a 0.514308
b 0.054993
In [22]: idx = pd.IndexSlice
In [23]: dft2 = dft2.swaplevel(0, 1).sort_index()
In [24]: dft2.loc[idx[:, '2013-01-05'], :]
Out[24]:
A
a 2013-01-05 00:00:00 0.488744
2013-01-05 12:00:00 0.514308
b 2013-01-05 00:00:00 0.550788
2013-01-05 12:00:00 0.054993
7.2 Datetime Indexing
Indexing a DateTimeIndex
with a partial string depends on the “accuracy” of the period, in other words how specific the interval is in relation to the frequency of the index. In contrast, indexing with datetime objects is exact, because the objects have exact meaning. These also follow the semantics of including both endpoints.
These datetime
objects are specific hours, minutes,
and seconds
even though they were not explicitly specified (they are 0
).
In [25]: dft[datetime(2013, 1, 1):datetime(2013,2,28)]
Out[25]:
A
2013-01-01 00:00:00 0.721555
2013-01-01 00:01:00 -0.706771
2013-01-01 00:02:00 -1.039575
2013-01-01 00:03:00 0.271860
... ...
2013-02-27 23:57:00 -0.102232
2013-02-27 23:58:00 -0.186728
2013-02-27 23:59:00 0.694242
2013-02-28 00:00:00 1.604295
[83521 rows x 1 columns]
With no defaults.
In [26]: dft[datetime(2013, 1, 1, 10, 12, 0):datetime(2013, 2, 28, 10, 12, 0)]
Out[26]:
A
2013-01-01 10:12:00 0.074192
2013-01-01 10:13:00 -0.249712
2013-01-01 10:14:00 1.162248
2013-01-01 10:15:00 -0.288962
... ...
2013-02-28 10:09:00 0.309910
2013-02-28 10:10:00 -0.000479
2013-02-28 10:11:00 1.785966
2013-02-28 10:12:00 -1.393114
[83521 rows x 1 columns]
7.3 Truncating & Fancy Indexing
A truncate
convenience function is provided that is equivalent to slicing:
In [27]: ts.truncate(before='10/31/2011', after='12/31/2011')
Out[27]:
2011-10-31 -2.104569
2011-11-30 -0.494929
2011-12-30 1.071804
Freq: BM, dtype: float64
Even complicated fancy indexing that breaks the DatetimeIndex’s frequency
regularity will result in a DatetimeIndex
(but frequency is lost):
In [28]: ts[[0, 2, 6]].index
Out[28]: DatetimeIndex(['2011-01-31', '2011-03-31', '2011-07-29'], dtype='datetime64[ns]', freq=None)
7.4 Time/Date Components
There are several time/date properties that one can access from Timestamp
or a collection of timestamps like a DateTimeIndex
.
Property | Description |
---|---|
year | The year of the datetime |
month | The month of the datetime |
day | The days of the datetime |
hour | The hour of the datetime |
minute | The minutes of the datetime |
second | The seconds of the datetime |
microsecond | The microseconds of the datetime |
nanosecond | The nanoseconds of the datetime |
date | Returns datetime.date (does not contain timezone information) |
time | Returns datetime.time (does not contain timezone information) |
dayofyear | The ordinal day of year |
weekofyear | The week ordinal of the year |
week | The week ordinal of the year |
dayofweek | The numer of the day of the week with Monday=0, Sunday=6 |
weekday | The number of the day of the week with Monday=0, Sunday=6 |
weekday_name | The name of the day in a week (ex: Friday) |
quarter | Quarter of the date: Jan=Mar = 1, Apr-Jun = 2, etc. |
days_in_month | The number of days in the month of the datetime |
is_month_start | Logical indicating if first day of month (defined by frequency) |
is_month_end | Logical indicating if last day of month (defined by frequency) |
is_quarter_start | Logical indicating if first day of quarter (defined by frequency) |
is_quarter_end | Logical indicating if last day of quarter (defined by frequency) |
is_year_start | Logical indicating if first day of year (defined by frequency) |
is_year_end | Logical indicating if last day of year (defined by frequency) |
is_leap_year | Logical indicating if the date belongs to a leap year |
Furthermore, if you have a Series
with datetimelike values, then you can access these properties via the .dt
accessor, see the docs