8 DateOffset objects
In the preceding examples, we created DatetimeIndex objects at various
frequencies by passing in frequency strings
like ‘M’, ‘W’, and ‘BM to the freq
keyword. Under the hood, these frequency
strings are being translated into an instance of pandas DateOffset
,
which represents a regular frequency increment. Specific offset logic like
“month”, “business day”, or “one hour” is represented in its various subclasses.
Class name | Description |
---|---|
DateOffset | Generic offset class, defaults to 1 calendar day |
BDay | business day (weekday) |
CDay | custom business day (experimental) |
Week | one week, optionally anchored on a day of the week |
WeekOfMonth | the x-th day of the y-th week of each month |
LastWeekOfMonth | the x-th day of the last week of each month |
MonthEnd | calendar month end |
MonthBegin | calendar month begin |
BMonthEnd | business month end |
BMonthBegin | business month begin |
CBMonthEnd | custom business month end |
CBMonthBegin | custom business month begin |
SemiMonthEnd | 15th (or other day_of_month) and calendar month end |
SemiMonthBegin | 15th (or other day_of_month) and calendar month begin |
QuarterEnd | calendar quarter end |
QuarterBegin | calendar quarter begin |
BQuarterEnd | business quarter end |
BQuarterBegin | business quarter begin |
FY5253Quarter | retail (aka 52-53 week) quarter |
YearEnd | calendar year end |
YearBegin | calendar year begin |
BYearEnd | business year end |
BYearBegin | business year begin |
FY5253 | retail (aka 52-53 week) year |
BusinessHour | business hour |
CustomBusinessHour | custom business hour |
Hour | one hour |
Minute | one minute |
Second | one second |
Milli | one millisecond |
Micro | one microsecond |
Nano | one nanosecond |
The basic DateOffset
takes the same arguments as
dateutil.relativedelta
, which works like:
In [1]: d = datetime(2008, 8, 18, 9, 0)
In [2]: d + relativedelta(months=4, days=5)
Out[2]: datetime.datetime(2008, 12, 23, 9, 0)
We could have done the same thing with DateOffset
:
In [3]: from pandas.tseries.offsets import *
In [4]: d + DateOffset(months=4, days=5)
Out[4]: Timestamp('2008-12-23 09:00:00')
The key features of a DateOffset
object are:
- it can be added / subtracted to/from a datetime object to obtain a shifted date
- it can be multiplied by an integer (positive or negative) so that the increment will be applied multiple times
- it has
rollforward
androllback
methods for moving a date forward or backward to the next or previous “offset date”
Subclasses of DateOffset
define the apply
function which dictates
custom date increment logic, such as adding business days:
class BDay(DateOffset):
"""DateOffset increments between business days"""
def apply(self, other):
...
In [5]: d - 5 * BDay()
Out[5]: Timestamp('2008-08-11 09:00:00')
In [6]: d + BMonthEnd()
Out[6]: Timestamp('2008-08-29 09:00:00')
The rollforward
and rollback
methods do exactly what you would expect:
In [7]: d
Out[7]: datetime.datetime(2008, 8, 18, 9, 0)
In [8]: offset = BMonthEnd()
In [9]: offset.rollforward(d)
Out[9]: Timestamp('2008-08-29 09:00:00')
In [10]: offset.rollback(d)
Out[10]: Timestamp('2008-07-31 09:00:00')
It’s definitely worth exploring the pandas.tseries.offsets
module and the
various docstrings for the classes.
These operations (apply
, rollforward
and rollback
) preserves time (hour, minute, etc) information by default. To reset time, use normalize=True
keyword when creating the offset instance. If normalize=True
, result is normalized after the function is applied.
In [11]: day = Day()
In [12]: day.apply(pd.Timestamp('2014-01-01 09:00'))
Out[12]: Timestamp('2014-01-02 09:00:00')
In [13]: day = Day(normalize=True)
In [14]: day.apply(pd.Timestamp('2014-01-01 09:00'))
Out[14]: Timestamp('2014-01-02 00:00:00')
In [15]: hour = Hour()
In [16]: hour.apply(pd.Timestamp('2014-01-01 22:00'))
Out[16]: Timestamp('2014-01-01 23:00:00')
In [17]: hour = Hour(normalize=True)
In [18]: hour.apply(pd.Timestamp('2014-01-01 22:00'))
Out[18]: Timestamp('2014-01-01 00:00:00')
In [19]: hour.apply(pd.Timestamp('2014-01-01 23:00'))
Out[19]: Timestamp('2014-01-02 00:00:00')
8.1 Parametric offsets
Some of the offsets can be “parameterized” when created to result in different
behaviors. For example, the Week
offset for generating weekly data accepts a
weekday
parameter which results in the generated dates always lying on a
particular day of the week:
In [20]: d
Out[20]: datetime.datetime(2008, 8, 18, 9, 0)
In [21]: d + Week()
Out[21]: Timestamp('2008-08-25 09:00:00')
In [22]: d + Week(weekday=4)
Out[22]: Timestamp('2008-08-22 09:00:00')
In [23]: (d + Week(weekday=4)).weekday()
Out[23]: 4
In [24]: d - Week()
Out[24]: Timestamp('2008-08-11 09:00:00')
normalize
option will be effective for addition and subtraction.
In [25]: d + Week(normalize=True)
Out[25]: Timestamp('2008-08-25 00:00:00')
In [26]: d - Week(normalize=True)
Out[26]: Timestamp('2008-08-11 00:00:00')
Another example is parameterizing YearEnd
with the specific ending month:
In [27]: d + YearEnd()
Out[27]: Timestamp('2008-12-31 09:00:00')
In [28]: d + YearEnd(month=6)
Out[28]: Timestamp('2009-06-30 09:00:00')
8.2 Using offsets with Series
/ DatetimeIndex
Offsets can be used with either a Series
or DatetimeIndex
to
apply the offset to each element.
In [29]: rng = pd.date_range('2012-01-01', '2012-01-03')
In [30]: s = pd.Series(rng)
In [31]: rng
Out[31]: DatetimeIndex(['2012-01-01', '2012-01-02', '2012-01-03'], dtype='datetime64[ns]', freq='D')
In [32]: rng + DateOffset(months=2)
Out[32]: DatetimeIndex(['2012-03-01', '2012-03-02', '2012-03-03'], dtype='datetime64[ns]', freq='D')
In [33]: s + DateOffset(months=2)
Out[33]:
0 2012-03-01
1 2012-03-02
2 2012-03-03
dtype: datetime64[ns]
In [34]: s - DateOffset(months=2)
Out[34]:
0 2011-11-01
1 2011-11-02
2 2011-11-03
dtype: datetime64[ns]
If the offset class maps directly to a Timedelta
(Day
, Hour
,
Minute
, Second
, Micro
, Milli
, Nano
) it can be
used exactly like a Timedelta
- see the
Timedelta section for more examples.
In [35]: s - Day(2)
Out[35]:
0 2011-12-30
1 2011-12-31
2 2012-01-01
dtype: datetime64[ns]
In [36]: td = s - pd.Series(pd.date_range('2011-12-29', '2011-12-31'))
In [37]: td
Out[37]:
0 3 days
1 3 days
2 3 days
dtype: timedelta64[ns]
In [38]: td + Minute(15)
Out[38]:
0 3 days 00:15:00
1 3 days 00:15:00
2 3 days 00:15:00
dtype: timedelta64[ns]
Note that some offsets (such as BQuarterEnd
) do not have a
vectorized implementation. They can still be used but may
calculate significantly slower and will raise a PerformanceWarning
In [39]: rng + BQuarterEnd()
Out[39]: DatetimeIndex(['2012-03-30', '2012-03-30', '2012-03-30'], dtype='datetime64[ns]', freq=None)
8.3 Custom Business Days (Experimental)
The CDay
or CustomBusinessDay
class provides a parametric
BusinessDay
class which can be used to create customized business day
calendars which account for local holidays and local weekend conventions.
As an interesting example, let’s look at Egypt where a Friday-Saturday weekend is observed.
In [40]: from pandas.tseries.offsets import CustomBusinessDay
In [41]: weekmask_egypt = 'Sun Mon Tue Wed Thu'
# They also observe International Workers' Day so let's
# add that for a couple of years
In [42]: holidays = ['2012-05-01', datetime(2013, 5, 1), np.datetime64('2014-05-01')]
In [43]: bday_egypt = CustomBusinessDay(holidays=holidays, weekmask=weekmask_egypt)
In [44]: dt = datetime(2013, 4, 30)
In [45]: dt + 2 * bday_egypt
Out[45]: Timestamp('2013-05-05 00:00:00')
Let’s map to the weekday names
In [46]: dts = pd.date_range(dt, periods=5, freq=bday_egypt)
In [47]: pd.Series(dts.weekday, dts).map(pd.Series('Mon Tue Wed Thu Fri Sat Sun'.split()))
Out[47]:
2013-04-30 Tue
2013-05-02 Thu
2013-05-05 Sun
2013-05-06 Mon
2013-05-07 Tue
Freq: C, dtype: object
As of v0.14 holiday calendars can be used to provide the list of holidays. See the holiday calendar section for more information.
In [48]: from pandas.tseries.holiday import USFederalHolidayCalendar
In [49]: bday_us = CustomBusinessDay(calendar=USFederalHolidayCalendar())
# Friday before MLK Day
In [50]: dt = datetime(2014, 1, 17)
# Tuesday after MLK Day (Monday is skipped because it's a holiday)
In [51]: dt + bday_us
Out[51]: Timestamp('2014-01-21 00:00:00')
Monthly offsets that respect a certain holiday calendar can be defined in the usual way.
In [52]: from pandas.tseries.offsets import CustomBusinessMonthBegin
In [53]: bmth_us = CustomBusinessMonthBegin(calendar=USFederalHolidayCalendar())
# Skip new years
In [54]: dt = datetime(2013, 12, 17)
In [55]: dt + bmth_us
Out[55]: Timestamp('2014-01-02 00:00:00')
# Define date index with custom offset
In [56]: pd.DatetimeIndex(start='20100101',end='20120101',freq=bmth_us)
Out[56]:
DatetimeIndex(['2010-01-04', '2010-02-01', '2010-03-01', '2010-04-01',
'2010-05-03', '2010-06-01', '2010-07-01', '2010-08-02',
'2010-09-01', '2010-10-01', '2010-11-01', '2010-12-01',
'2011-01-03', '2011-02-01', '2011-03-01', '2011-04-01',
'2011-05-02', '2011-06-01', '2011-07-01', '2011-08-01',
'2011-09-01', '2011-10-03', '2011-11-01', '2011-12-01'],
dtype='datetime64[ns]', freq='CBMS')
Note
The frequency string ‘C’ is used to indicate that a CustomBusinessDay DateOffset is used, it is important to note that since CustomBusinessDay is a parameterised type, instances of CustomBusinessDay may differ and this is not detectable from the ‘C’ frequency string. The user therefore needs to ensure that the ‘C’ frequency string is used consistently within the user’s application.
8.4 Business Hour
The BusinessHour
class provides a business hour representation on BusinessDay
,
allowing to use specific start and end times.
By default, BusinessHour
uses 9:00 - 17:00 as business hours.
Adding BusinessHour
will increment Timestamp
by hourly.
If target Timestamp
is out of business hours, move to the next business hour then increment it.
If the result exceeds the business hours end, remaining is added to the next business day.
In [57]: bh = BusinessHour()
In [58]: bh
Out[58]: <BusinessHour: BH=09:00-17:00>
# 2014-08-01 is Friday
In [59]: pd.Timestamp('2014-08-01 10:00').weekday()
Out[59]: 4
In [60]: pd.Timestamp('2014-08-01 10:00') + bh
Out[60]: Timestamp('2014-08-01 11:00:00')
# Below example is the same as: pd.Timestamp('2014-08-01 09:00') + bh
In [61]: pd.Timestamp('2014-08-01 08:00') + bh
Out[61]: Timestamp('2014-08-01 10:00:00')
# If the results is on the end time, move to the next business day
In [62]: pd.Timestamp('2014-08-01 16:00') + bh
Out[62]: Timestamp('2014-08-04 09:00:00')
# Remainings are added to the next day
In [63]: pd.Timestamp('2014-08-01 16:30') + bh
Out[63]: Timestamp('2014-08-04 09:30:00')
# Adding 2 business hours
In [64]: pd.Timestamp('2014-08-01 10:00') + BusinessHour(2)
Out[64]: Timestamp('2014-08-01 12:00:00')
# Subtracting 3 business hours
In [65]: pd.Timestamp('2014-08-01 10:00') + BusinessHour(-3)
Out[65]: Timestamp('2014-07-31 15:00:00')
Also, you can specify start
and end
time by keywords.
Argument must be str
which has hour:minute
representation or datetime.time
instance.
Specifying seconds, microseconds and nanoseconds as business hour results in ValueError
.
In [66]: bh = BusinessHour(start='11:00', end=time(20, 0))
In [67]: bh
Out[67]: <BusinessHour: BH=11:00-20:00>
In [68]: pd.Timestamp('2014-08-01 13:00') + bh
Out[68]: Timestamp('2014-08-01 14:00:00')
In [69]: pd.Timestamp('2014-08-01 09:00') + bh
Out[69]: Timestamp('2014-08-01 12:00:00')
In [70]: pd.Timestamp('2014-08-01 18:00') + bh
Out[70]: Timestamp('2014-08-01 19:00:00')
Passing start
time later than end
represents midnight business hour.
In this case, business hour exceeds midnight and overlap to the next day.
Valid business hours are distinguished by whether it started from valid BusinessDay
.
In [71]: bh = BusinessHour(start='17:00', end='09:00')
In [72]: bh
Out[72]: <BusinessHour: BH=17:00-09:00>
In [73]: pd.Timestamp('2014-08-01 17:00') + bh
Out[73]: Timestamp('2014-08-01 18:00:00')
In [74]: pd.Timestamp('2014-08-01 23:00') + bh
Out[74]: Timestamp('2014-08-02 00:00:00')
# Although 2014-08-02 is Satuaday,
# it is valid because it starts from 08-01 (Friday).
In [75]: pd.Timestamp('2014-08-02 04:00') + bh
Out[75]: Timestamp('2014-08-02 05:00:00')
# Although 2014-08-04 is Monday,
# it is out of business hours because it starts from 08-03 (Sunday).
In [76]: pd.Timestamp('2014-08-04 04:00') + bh
Out[76]: Timestamp('2014-08-04 18:00:00')
Applying BusinessHour.rollforward
and rollback
to out of business hours results in
the next business hour start or previous day’s end. Different from other offsets, BusinessHour.rollforward
may output different results from apply
by definition.
This is because one day’s business hour end is equal to next day’s business hour start. For example,
under the default business hours (9:00 - 17:00), there is no gap (0 minutes) between 2014-08-01 17:00
and
2014-08-04 09:00
.
# This adjusts a Timestamp to business hour edge
In [77]: BusinessHour().rollback(pd.Timestamp('2014-08-02 15:00'))
Out[77]: Timestamp('2014-08-01 17:00:00')
In [78]: BusinessHour().rollforward(pd.Timestamp('2014-08-02 15:00'))
Out[78]: Timestamp('2014-08-04 09:00:00')
# It is the same as BusinessHour().apply(pd.Timestamp('2014-08-01 17:00')).
# And it is the same as BusinessHour().apply(pd.Timestamp('2014-08-04 09:00'))
In [79]: BusinessHour().apply(pd.Timestamp('2014-08-02 15:00'))
Out[79]: Timestamp('2014-08-04 10:00:00')
# BusinessDay results (for reference)
In [80]: BusinessHour().rollforward(pd.Timestamp('2014-08-02'))
Out[80]: Timestamp('2014-08-04 09:00:00')
# It is the same as BusinessDay().apply(pd.Timestamp('2014-08-01'))
# The result is the same as rollworward because BusinessDay never overlap.
In [81]: BusinessHour().apply(pd.Timestamp('2014-08-02'))
Out[81]: Timestamp('2014-08-04 10:00:00')
BusinessHour
regards Saturday and Sunday as holidays. To use arbitrary holidays,
you can use CustomBusinessHour
offset, see Custom Business Hour:
8.5 Custom Business Hour
New in version 0.18.1.
The CustomBusinessHour
is a mixture of BusinessHour
and CustomBusinessDay
which
allows you to specify arbitrary holidays. CustomBusinessHour
works as the same
as BusinessHour
except that it skips specified custom holidays.
In [82]: from pandas.tseries.holiday import USFederalHolidayCalendar
In [83]: bhour_us = CustomBusinessHour(calendar=USFederalHolidayCalendar())
# Friday before MLK Day
In [84]: dt = datetime(2014, 1, 17, 15)
In [85]: dt + bhour_us
Out[85]: Timestamp('2014-01-17 16:00:00')
# Tuesday after MLK Day (Monday is skipped because it's a holiday)
In [86]: dt + bhour_us * 2
Out[86]: Timestamp('2014-01-21 09:00:00')
You can use keyword arguments suported by either BusinessHour
and CustomBusinessDay
.
In [87]: bhour_mon = CustomBusinessHour(start='10:00', weekmask='Tue Wed Thu Fri')
# Monday is skipped because it's a holiday, business hour starts from 10:00
In [88]: dt + bhour_mon * 2
Out[88]: Timestamp('2014-01-21 10:00:00')
8.6 Offset Aliases
A number of string aliases are given to useful common time series frequencies. We will refer to these aliases as offset aliases (referred to as time rules prior to v0.8.0).
Alias | Description |
---|---|
B | business day frequency |
C | custom business day frequency (experimental) |
D | calendar day frequency |
W | weekly frequency |
M | month end frequency |
SM | semi-month end frequency (15th and end of month) |
BM | business month end frequency |
CBM | custom business month end frequency |
MS | month start frequency |
SMS | semi-month start frequency (1st and 15th) |
BMS | business month start frequency |
CBMS | custom business month start frequency |
Q | quarter end frequency |
BQ | business quarter endfrequency |
QS | quarter start frequency |
BQS | business quarter start frequency |
A | year end frequency |
BA | business year end frequency |
AS | year start frequency |
BAS | business year start frequency |
BH | business hour frequency |
H | hourly frequency |
T, min | minutely frequency |
S | secondly frequency |
L, ms | milliseconds |
U, us | microseconds |
N | nanoseconds |
8.7 Combining Aliases
As we have seen previously, the alias and the offset instance are fungible in most functions:
In [89]: pd.date_range(start, periods=5, freq='B')
Out[89]:
DatetimeIndex(['2011-01-03', '2011-01-04', '2011-01-05', '2011-01-06',
'2011-01-07'],
dtype='datetime64[ns]', freq='B')
In [90]: pd.date_range(start, periods=5, freq=BDay())
Out[90]:
DatetimeIndex(['2011-01-03', '2011-01-04', '2011-01-05', '2011-01-06',
'2011-01-07'],
dtype='datetime64[ns]', freq='B')
You can combine together day and intraday offsets:
In [91]: pd.date_range(start, periods=10, freq='2h20min')
Out[91]:
DatetimeIndex(['2011-01-01 00:00:00', '2011-01-01 02:20:00',
'2011-01-01 04:40:00', '2011-01-01 07:00:00',
'2011-01-01 09:20:00', '2011-01-01 11:40:00',
'2011-01-01 14:00:00', '2011-01-01 16:20:00',
'2011-01-01 18:40:00', '2011-01-01 21:00:00'],
dtype='datetime64[ns]', freq='140T')
In [92]: pd.date_range(start, periods=10, freq='1D10U')
Out[92]:
DatetimeIndex([ '2011-01-01 00:00:00', '2011-01-02 00:00:00.000010',
'2011-01-03 00:00:00.000020', '2011-01-04 00:00:00.000030',
'2011-01-05 00:00:00.000040', '2011-01-06 00:00:00.000050',
'2011-01-07 00:00:00.000060', '2011-01-08 00:00:00.000070',
'2011-01-09 00:00:00.000080', '2011-01-10 00:00:00.000090'],
dtype='datetime64[ns]', freq='86400000010U')
8.8 Anchored Offsets
For some frequencies you can specify an anchoring suffix:
Alias | Description |
---|---|
W-SUN | weekly frequency (sundays). Same as ‘W’ |
W-MON | weekly frequency (mondays) |
W-TUE | weekly frequency (tuesdays) |
W-WED | weekly frequency (wednesdays) |
W-THU | weekly frequency (thursdays) |
W-FRI | weekly frequency (fridays) |
W-SAT | weekly frequency (saturdays) |
(B)Q(S)-DEC | quarterly frequency, year ends in December. Same as ‘Q’ |
(B)Q(S)-JAN | quarterly frequency, year ends in January |
(B)Q(S)-FEB | quarterly frequency, year ends in February |
(B)Q(S)-MAR | quarterly frequency, year ends in March |
(B)Q(S)-APR | quarterly frequency, year ends in April |
(B)Q(S)-MAY | quarterly frequency, year ends in May |
(B)Q(S)-JUN | quarterly frequency, year ends in June |
(B)Q(S)-JUL | quarterly frequency, year ends in July |
(B)Q(S)-AUG | quarterly frequency, year ends in August |
(B)Q(S)-SEP | quarterly frequency, year ends in September |
(B)Q(S)-OCT | quarterly frequency, year ends in October |
(B)Q(S)-NOV | quarterly frequency, year ends in November |
(B)A(S)-DEC | annual frequency, anchored end of December. Same as ‘A’ |
(B)A(S)-JAN | annual frequency, anchored end of January |
(B)A(S)-FEB | annual frequency, anchored end of February |
(B)A(S)-MAR | annual frequency, anchored end of March |
(B)A(S)-APR | annual frequency, anchored end of April |
(B)A(S)-MAY | annual frequency, anchored end of May |
(B)A(S)-JUN | annual frequency, anchored end of June |
(B)A(S)-JUL | annual frequency, anchored end of July |
(B)A(S)-AUG | annual frequency, anchored end of August |
(B)A(S)-SEP | annual frequency, anchored end of September |
(B)A(S)-OCT | annual frequency, anchored end of October |
(B)A(S)-NOV | annual frequency, anchored end of November |
These can be used as arguments to date_range
, bdate_range
, constructors
for DatetimeIndex
, as well as various other timeseries-related functions
in pandas.
8.9 Anchored Offset Semantics
For those offsets that are anchored to the start or end of specific
frequency (MonthEnd
, MonthBegin
, WeekEnd
, etc) the following
rules apply to rolling forward and backwards.
When n
is not 0, if the given date is not on an anchor point, it snapped to the next(previous)
anchor point, and moved |n|-1
additional steps forwards or backwards.
In [93]: pd.Timestamp('2014-01-02') + MonthBegin(n=1)
Out[93]: Timestamp('2014-02-01 00:00:00')
In [94]: pd.Timestamp('2014-01-02') + MonthEnd(n=1)
Out[94]: Timestamp('2014-01-31 00:00:00')
In [95]: pd.Timestamp('2014-01-02') - MonthBegin(n=1)
Out[95]: Timestamp('2014-01-01 00:00:00')
In [96]: pd.Timestamp('2014-01-02') - MonthEnd(n=1)
Out[96]: Timestamp('2013-12-31 00:00:00')
In [97]: pd.Timestamp('2014-01-02') + MonthBegin(n=4)
Out[97]: Timestamp('2014-05-01 00:00:00')
In [98]: pd.Timestamp('2014-01-02') - MonthBegin(n=4)
Out[98]: Timestamp('2013-10-01 00:00:00')
If the given date is on an anchor point, it is moved |n|
points forwards
or backwards.
In [99]: pd.Timestamp('2014-01-01') + MonthBegin(n=1)
Out[99]: Timestamp('2014-02-01 00:00:00')
In [100]: pd.Timestamp('2014-01-31') + MonthEnd(n=1)
Out[100]: Timestamp('2014-02-28 00:00:00')
In [101]: pd.Timestamp('2014-01-01') - MonthBegin(n=1)
Out[101]: Timestamp('2013-12-01 00:00:00')
In [102]: pd.Timestamp('2014-01-31') - MonthEnd(n=1)
Out[102]: Timestamp('2013-12-31 00:00:00')
In [103]: pd.Timestamp('2014-01-01') + MonthBegin(n=4)
Out[103]: Timestamp('2014-05-01 00:00:00')
In [104]: pd.Timestamp('2014-01-31') - MonthBegin(n=4)
Out[104]: Timestamp('2013-10-01 00:00:00')
For the case when n=0
, the date is not moved if on an anchor point, otherwise
it is rolled forward to the next anchor point.
In [105]: pd.Timestamp('2014-01-02') + MonthBegin(n=0)
Out[105]: Timestamp('2014-02-01 00:00:00')
In [106]: pd.Timestamp('2014-01-02') + MonthEnd(n=0)
Out[106]: Timestamp('2014-01-31 00:00:00')
In [107]: pd.Timestamp('2014-01-01') + MonthBegin(n=0)
Out[107]: Timestamp('2014-01-01 00:00:00')
In [108]: pd.Timestamp('2014-01-31') + MonthEnd(n=0)
Out[108]: Timestamp('2014-01-31 00:00:00')
8.10 Holidays / Holiday Calendars
Holidays and calendars provide a simple way to define holiday rules to be used
with CustomBusinessDay
or in other analysis that requires a predefined
set of holidays. The AbstractHolidayCalendar
class provides all the necessary
methods to return a list of holidays and only rules
need to be defined
in a specific holiday calendar class. Further, start_date
and end_date
class attributes determine over what date range holidays are generated. These
should be overwritten on the AbstractHolidayCalendar
class to have the range
apply to all calendar subclasses. USFederalHolidayCalendar
is the
only calendar that exists and primarily serves as an example for developing
other calendars.
For holidays that occur on fixed dates (e.g., US Memorial Day or July 4th) an observance rule determines when that holiday is observed if it falls on a weekend or some other non-observed day. Defined observance rules are:
Rule | Description |
---|---|
nearest_workday | move Saturday to Friday and Sunday to Monday |
sunday_to_monday | move Sunday to following Monday |
next_monday_or_tuesday | move Saturday to Monday and Sunday/Monday to Tuesday |
previous_friday | move Saturday and Sunday to previous Friday” |
next_monday | move Saturday and Sunday to following Monday |
An example of how holidays and holiday calendars are defined:
In [109]: from pandas.tseries.holiday import Holiday, USMemorialDay,\
.....: AbstractHolidayCalendar, nearest_workday, MO
.....:
In [110]: class ExampleCalendar(AbstractHolidayCalendar):
.....: rules = [
.....: USMemorialDay,
.....: Holiday('July 4th', month=7, day=4, observance=nearest_workday),
.....: Holiday('Columbus Day', month=10, day=1,
.....: offset=DateOffset(weekday=MO(2))), #same as 2*Week(weekday=2)
.....: ]
.....:
In [111]: cal = ExampleCalendar()
In [112]: cal.holidays(datetime(2012, 1, 1), datetime(2012, 12, 31))
Out[112]: DatetimeIndex(['2012-05-28', '2012-07-04', '2012-10-08'], dtype='datetime64[ns]', freq=None)
Using this calendar, creating an index or doing offset arithmetic skips weekends
and holidays (i.e., Memorial Day/July 4th). For example, the below defines
a custom business day offset using the ExampleCalendar
. Like any other offset,
it can be used to create a DatetimeIndex
or added to datetime
or Timestamp
objects.
In [113]: from pandas.tseries.offsets import CDay
In [114]: pd.DatetimeIndex(start='7/1/2012', end='7/10/2012',
.....: freq=CDay(calendar=cal)).to_pydatetime()
.....:
Out[114]:
array([datetime.datetime(2012, 7, 2, 0, 0),
datetime.datetime(2012, 7, 3, 0, 0),
datetime.datetime(2012, 7, 5, 0, 0),
datetime.datetime(2012, 7, 6, 0, 0),
datetime.datetime(2012, 7, 9, 0, 0),
datetime.datetime(2012, 7, 10, 0, 0)], dtype=object)
In [115]: offset = CustomBusinessDay(calendar=cal)
In [116]: datetime(2012, 5, 25) + offset
Out[116]: Timestamp('2012-05-29 00:00:00')
In [117]: datetime(2012, 7, 3) + offset
Out[117]: Timestamp('2012-07-05 00:00:00')
In [118]: datetime(2012, 7, 3) + 2 * offset
Out[118]: Timestamp('2012-07-06 00:00:00')
In [119]: datetime(2012, 7, 6) + offset
Out[119]: Timestamp('2012-07-09 00:00:00')
Ranges are defined by the start_date
and end_date
class attributes
of AbstractHolidayCalendar
. The defaults are below.
In [120]: AbstractHolidayCalendar.start_date
Out[120]: Timestamp('1970-01-01 00:00:00')
In [121]: AbstractHolidayCalendar.end_date
Out[121]: Timestamp('2030-12-31 00:00:00')
These dates can be overwritten by setting the attributes as datetime/Timestamp/string.
In [122]: AbstractHolidayCalendar.start_date = datetime(2012, 1, 1)
In [123]: AbstractHolidayCalendar.end_date = datetime(2012, 12, 31)
In [124]: cal.holidays()
Out[124]: DatetimeIndex(['2012-05-28', '2012-07-04', '2012-10-08'], dtype='datetime64[ns]', freq=None)
Every calendar class is accessible by name using the get_calendar
function
which returns a holiday class instance. Any imported calendar class will
automatically be available by this function. Also, HolidayCalendarFactory
provides an easy interface to create calendars that are combinations of calendars
or calendars with additional rules.
In [125]: from pandas.tseries.holiday import get_calendar, HolidayCalendarFactory,\
.....: USLaborDay
.....:
In [126]: cal = get_calendar('ExampleCalendar')
In [127]: cal.rules
Out[127]:
[Holiday: MemorialDay (month=5, day=31, offset=<DateOffset: kwds={'weekday': MO(-1)}>),
Holiday: July 4th (month=7, day=4, observance=<function nearest_workday at 0x2b6c3c198e60>),
Holiday: Columbus Day (month=10, day=1, offset=<DateOffset: kwds={'weekday': MO(+2)}>)]
In [128]: new_cal = HolidayCalendarFactory('NewExampleCalendar', cal, USLaborDay)
In [129]: new_cal.rules
Out[129]:
[Holiday: Labor Day (month=9, day=1, offset=<DateOffset: kwds={'weekday': MO(+1)}>),
Holiday: Columbus Day (month=10, day=1, offset=<DateOffset: kwds={'weekday': MO(+2)}>),
Holiday: July 4th (month=7, day=4, observance=<function nearest_workday at 0x2b6c3c198e60>),
Holiday: MemorialDay (month=5, day=31, offset=<DateOffset: kwds={'weekday': MO(-1)}>)]