7 DatetimeIndex

One of the main uses for DatetimeIndex is as an index for pandas objects. The DatetimeIndex class contains many timeseries related optimizations:

  • A large range of dates for various offsets are pre-computed and cached under the hood in order to make generating subsequent date ranges very fast (just have to grab a slice)
  • Fast shifting using the shift and tshift method on pandas objects
  • Unioning of overlapping DatetimeIndex objects with the same frequency is very fast (important for fast data alignment)
  • Quick access to date fields via properties such as year, month, etc.
  • Regularization functions like snap and very fast asof logic

DatetimeIndex objects has all the basic functionality of regular Index objects and a smorgasbord of advanced timeseries-specific methods for easy frequency processing.

Note

While pandas does not force you to have a sorted date index, some of these methods may have unexpected or incorrect behavior if the dates are unsorted. So please be careful.

DatetimeIndex can be used like a regular index and offers all of its intelligent functionality like selection, slicing, etc.

In [1]: rng = pd.date_range(start, end, freq='BM')

In [2]: ts = pd.Series(np.random.randn(len(rng)), index=rng)

In [3]: ts.index
Out[3]: 
DatetimeIndex(['2011-01-31', '2011-02-28', '2011-03-31', '2011-04-29',
               '2011-05-31', '2011-06-30', '2011-07-29', '2011-08-31',
               '2011-09-30', '2011-10-31', '2011-11-30', '2011-12-30'],
              dtype='datetime64[ns]', freq='BM')

In [4]: ts[:5].index
Out[4]: 
DatetimeIndex(['2011-01-31', '2011-02-28', '2011-03-31', '2011-04-29',
               '2011-05-31'],
              dtype='datetime64[ns]', freq='BM')

In [5]: ts[::2].index
Out[5]: 
DatetimeIndex(['2011-01-31', '2011-03-31', '2011-05-31', '2011-07-29',
               '2011-09-30', '2011-11-30'],
              dtype='datetime64[ns]', freq='2BM')

7.1 DatetimeIndex Partial String Indexing

You can pass in dates and strings that parse to dates as indexing parameters:

In [6]: ts['1/31/2011']
Out[6]: 0.46911229990718628

In [7]: ts[datetime(2011, 12, 25):]
Out[7]: 
2011-12-30    1.071804
Freq: BM, dtype: float64

In [8]: ts['10/31/2011':'12/31/2011']
Out[8]: 
2011-10-31   -2.104569
2011-11-30   -0.494929
2011-12-30    1.071804
Freq: BM, dtype: float64

To provide convenience for accessing longer time series, you can also pass in the year or year and month as strings:

In [9]: ts['2011']
Out[9]: 
2011-01-31    0.469112
2011-02-28   -0.282863
2011-03-31   -1.509059
2011-04-29   -1.135632
                ...   
2011-09-30   -0.861849
2011-10-31   -2.104569
2011-11-30   -0.494929
2011-12-30    1.071804
Freq: BM, dtype: float64

In [10]: ts['2011-6']
Out[10]: 
2011-06-30   -0.173215
Freq: BM, dtype: float64

This type of slicing will work on a DataFrame with a DateTimeIndex as well. Since the partial string selection is a form of label slicing, the endpoints will be included. This would include matching times on an included date. Here’s an example:

In [11]: dft = pd.DataFrame(randn(100000,1),
   ....:                    columns=['A'],
   ....:                    index=pd.date_range('20130101',periods=100000,freq='T'))
   ....: 

In [12]: dft
Out[12]: 
                            A
2013-01-01 00:00:00  0.721555
2013-01-01 00:01:00 -0.706771
2013-01-01 00:02:00 -1.039575
2013-01-01 00:03:00  0.271860
...                       ...
2013-03-11 10:36:00 -0.777369
2013-03-11 10:37:00  0.293270
2013-03-11 10:38:00  1.076066
2013-03-11 10:39:00 -1.562855

[100000 rows x 1 columns]

In [13]: dft['2013']
Out[13]: 
                            A
2013-01-01 00:00:00  0.721555
2013-01-01 00:01:00 -0.706771
2013-01-01 00:02:00 -1.039575
2013-01-01 00:03:00  0.271860
...                       ...
2013-03-11 10:36:00 -0.777369
2013-03-11 10:37:00  0.293270
2013-03-11 10:38:00  1.076066
2013-03-11 10:39:00 -1.562855

[100000 rows x 1 columns]

This starts on the very first time in the month, and includes the last date & time for the month

In [14]: dft['2013-1':'2013-2']
Out[14]: 
                            A
2013-01-01 00:00:00  0.721555
2013-01-01 00:01:00 -0.706771
2013-01-01 00:02:00 -1.039575
2013-01-01 00:03:00  0.271860
...                       ...
2013-02-28 23:56:00 -0.969105
2013-02-28 23:57:00  0.367078
2013-02-28 23:58:00  0.069096
2013-02-28 23:59:00  0.096317

[84960 rows x 1 columns]

This specifies a stop time that includes all of the times on the last day

In [15]: dft['2013-1':'2013-2-28']
Out[15]: 
                            A
2013-01-01 00:00:00  0.721555
2013-01-01 00:01:00 -0.706771
2013-01-01 00:02:00 -1.039575
2013-01-01 00:03:00  0.271860
...                       ...
2013-02-28 23:56:00 -0.969105
2013-02-28 23:57:00  0.367078
2013-02-28 23:58:00  0.069096
2013-02-28 23:59:00  0.096317

[84960 rows x 1 columns]

This specifies an exact stop time (and is not the same as the above)

In [16]: dft['2013-1':'2013-2-28 00:00:00']
Out[16]: 
                            A
2013-01-01 00:00:00  0.721555
2013-01-01 00:01:00 -0.706771
2013-01-01 00:02:00 -1.039575
2013-01-01 00:03:00  0.271860
...                       ...
2013-02-27 23:57:00 -0.102232
2013-02-27 23:58:00 -0.186728
2013-02-27 23:59:00  0.694242
2013-02-28 00:00:00  1.604295

[83521 rows x 1 columns]

We are stopping on the included end-point as it is part of the index

In [17]: dft['2013-1-15':'2013-1-15 12:30:00']
Out[17]: 
                            A
2013-01-15 00:00:00  0.276491
2013-01-15 00:01:00 -0.286093
2013-01-15 00:02:00 -0.457773
2013-01-15 00:03:00  0.133433
...                       ...
2013-01-15 12:27:00  1.327543
2013-01-15 12:28:00 -0.160075
2013-01-15 12:29:00  1.295527
2013-01-15 12:30:00 -0.850073

[751 rows x 1 columns]

Warning

The following selection will raise a KeyError; otherwise this selection methodology would be inconsistent with other selection methods in pandas (as this is not a slice, nor does it resolve to one)

dft['2013-1-15 12:30:00']

To select a single row, use .loc

In [18]: dft.loc['2013-1-15 12:30:00']
Out[18]: 
A   -0.850073
Name: 2013-01-15 12:30:00, dtype: float64

New in version 0.18.0.

DatetimeIndex Partial String Indexing also works on DataFrames with a MultiIndex. For example:

In [19]: dft2 = pd.DataFrame(np.random.randn(20, 1),
   ....:                     columns=['A'],
   ....:                     index=pd.MultiIndex.from_product([pd.date_range('20130101',
   ....:                                                                     periods=10,
   ....:                                                                     freq='12H'),
   ....:                                                      ['a', 'b']]))
   ....: 

In [20]: dft2
Out[20]: 
                              A
2013-01-01 00:00:00 a -0.776320
                    b -0.747967
2013-01-01 12:00:00 a -0.034523
                    b -0.201754
...                         ...
2013-01-05 00:00:00 a  0.488744
                    b  0.550788
2013-01-05 12:00:00 a  0.514308
                    b  0.054993

[20 rows x 1 columns]

In [21]: dft2.loc['2013-01-05']
Out[21]: 
                              A
2013-01-05 00:00:00 a  0.488744
                    b  0.550788
2013-01-05 12:00:00 a  0.514308
                    b  0.054993

In [22]: idx = pd.IndexSlice

In [23]: dft2 = dft2.swaplevel(0, 1).sort_index()

In [24]: dft2.loc[idx[:, '2013-01-05'], :]
Out[24]: 
                              A
a 2013-01-05 00:00:00  0.488744
  2013-01-05 12:00:00  0.514308
b 2013-01-05 00:00:00  0.550788
  2013-01-05 12:00:00  0.054993

7.2 Datetime Indexing

Indexing a DateTimeIndex with a partial string depends on the “accuracy” of the period, in other words how specific the interval is in relation to the frequency of the index. In contrast, indexing with datetime objects is exact, because the objects have exact meaning. These also follow the semantics of including both endpoints.

These datetime objects are specific hours, minutes, and seconds even though they were not explicitly specified (they are 0).

In [25]: dft[datetime(2013, 1, 1):datetime(2013,2,28)]
Out[25]: 
                            A
2013-01-01 00:00:00  0.721555
2013-01-01 00:01:00 -0.706771
2013-01-01 00:02:00 -1.039575
2013-01-01 00:03:00  0.271860
...                       ...
2013-02-27 23:57:00 -0.102232
2013-02-27 23:58:00 -0.186728
2013-02-27 23:59:00  0.694242
2013-02-28 00:00:00  1.604295

[83521 rows x 1 columns]

With no defaults.

In [26]: dft[datetime(2013, 1, 1, 10, 12, 0):datetime(2013, 2, 28, 10, 12, 0)]
Out[26]: 
                            A
2013-01-01 10:12:00  0.074192
2013-01-01 10:13:00 -0.249712
2013-01-01 10:14:00  1.162248
2013-01-01 10:15:00 -0.288962
...                       ...
2013-02-28 10:09:00  0.309910
2013-02-28 10:10:00 -0.000479
2013-02-28 10:11:00  1.785966
2013-02-28 10:12:00 -1.393114

[83521 rows x 1 columns]

7.3 Truncating & Fancy Indexing

A truncate convenience function is provided that is equivalent to slicing:

In [27]: ts.truncate(before='10/31/2011', after='12/31/2011')
Out[27]: 
2011-10-31   -2.104569
2011-11-30   -0.494929
2011-12-30    1.071804
Freq: BM, dtype: float64

Even complicated fancy indexing that breaks the DatetimeIndex’s frequency regularity will result in a DatetimeIndex (but frequency is lost):

In [28]: ts[[0, 2, 6]].index
Out[28]: DatetimeIndex(['2011-01-31', '2011-03-31', '2011-07-29'], dtype='datetime64[ns]', freq=None)

7.4 Time/Date Components

There are several time/date properties that one can access from Timestamp or a collection of timestamps like a DateTimeIndex.

Property Description
year The year of the datetime
month The month of the datetime
day The days of the datetime
hour The hour of the datetime
minute The minutes of the datetime
second The seconds of the datetime
microsecond The microseconds of the datetime
nanosecond The nanoseconds of the datetime
date Returns datetime.date (does not contain timezone information)
time Returns datetime.time (does not contain timezone information)
dayofyear The ordinal day of year
weekofyear The week ordinal of the year
week The week ordinal of the year
dayofweek The numer of the day of the week with Monday=0, Sunday=6
weekday The number of the day of the week with Monday=0, Sunday=6
weekday_name The name of the day in a week (ex: Friday)
quarter Quarter of the date: Jan=Mar = 1, Apr-Jun = 2, etc.
days_in_month The number of days in the month of the datetime
is_month_start Logical indicating if first day of month (defined by frequency)
is_month_end Logical indicating if last day of month (defined by frequency)
is_quarter_start Logical indicating if first day of quarter (defined by frequency)
is_quarter_end Logical indicating if last day of quarter (defined by frequency)
is_year_start Logical indicating if first day of year (defined by frequency)
is_year_end Logical indicating if last day of year (defined by frequency)
is_leap_year Logical indicating if the date belongs to a leap year

Furthermore, if you have a Series with datetimelike values, then you can access these properties via the .dt accessor, see the docs