11 Time Span Representation
Regular intervals of time are represented by Period
objects in pandas while
sequences of Period
objects are collected in a PeriodIndex
, which can
be created with the convenience function period_range
.
11.1 Period
A Period
represents a span of time (e.g., a day, a month, a quarter, etc).
You can specify the span via freq
keyword using a frequency alias like below.
Because freq
represents a span of Period
, it cannot be negative like “-3D”.
In [1]: pd.Period('2012', freq='A-DEC')
Out[1]: Period('2012', 'A-DEC')
In [2]: pd.Period('2012-1-1', freq='D')
Out[2]: Period('2012-01-01', 'D')
In [3]: pd.Period('2012-1-1 19:00', freq='H')
Out[3]: Period('2012-01-01 19:00', 'H')
In [4]: pd.Period('2012-1-1 19:00', freq='5H')
Out[4]: Period('2012-01-01 19:00', '5H')
Adding and subtracting integers from periods shifts the period by its own
frequency. Arithmetic is not allowed between Period
with different freq
(span).
In [5]: p = pd.Period('2012', freq='A-DEC')
In [6]: p + 1
Out[6]: Period('2013', 'A-DEC')
In [7]: p - 3
Out[7]: Period('2009', 'A-DEC')
In [8]: p = pd.Period('2012-01', freq='2M')
In [9]: p + 2
Out[9]: Period('2012-05', '2M')
In [10]: p - 1
Out[10]: Period('2011-11', '2M')
In [11]: p == pd.Period('2012-01', freq='3M')
---------------------------------------------------------------------------
IncompatibleFrequency Traceback (most recent call last)
<ipython-input-11-ff54ce3238f5> in <module>()
----> 1 p == pd.Period('2012-01', freq='3M')
pandas/src/period.pyx in pandas._period._Period.__richcmp__ (pandas/src/period.c:11376)()
IncompatibleFrequency: Input has different freq=3M from Period(freq=2M)
If Period
freq is daily or higher (D
, H
, T
, S
, L
, U
, N
), offsets
and timedelta
-like can be added if the result can have the same freq. Otherwise, ValueError
will be raised.
In [12]: p = pd.Period('2014-07-01 09:00', freq='H')
In [13]: p + Hour(2)
Out[13]: Period('2014-07-01 11:00', 'H')
In [14]: p + timedelta(minutes=120)
Out[14]: Period('2014-07-01 11:00', 'H')
In [15]: p + np.timedelta64(7200, 's')
Out[15]: Period('2014-07-01 11:00', 'H')
In [1]: p + Minute(5)
Traceback
...
ValueError: Input has different freq from Period(freq=H)
If Period
has other freqs, only the same offsets
can be added. Otherwise, ValueError
will be raised.
In [16]: p = pd.Period('2014-07', freq='M')
In [17]: p + MonthEnd(3)
Out[17]: Period('2014-10', 'M')
In [1]: p + MonthBegin(3)
Traceback
...
ValueError: Input has different freq from Period(freq=M)
Taking the difference of Period
instances with the same frequency will
return the number of frequency units between them:
In [18]: pd.Period('2012', freq='A-DEC') - pd.Period('2002', freq='A-DEC')
Out[18]: 10
11.2 PeriodIndex and period_range
Regular sequences of Period
objects can be collected in a PeriodIndex
,
which can be constructed using the period_range
convenience function:
In [19]: prng = pd.period_range('1/1/2011', '1/1/2012', freq='M')
In [20]: prng
Out[20]:
PeriodIndex(['2011-01', '2011-02', '2011-03', '2011-04', '2011-05', '2011-06',
'2011-07', '2011-08', '2011-09', '2011-10', '2011-11', '2011-12',
'2012-01'],
dtype='int64', freq='M')
The PeriodIndex
constructor can also be used directly:
In [21]: pd.PeriodIndex(['2011-1', '2011-2', '2011-3'], freq='M')
Out[21]: PeriodIndex(['2011-01', '2011-02', '2011-03'], dtype='int64', freq='M')
Passing multiplied frequency outputs a sequence of Period
which
has multiplied span.
In [22]: pd.PeriodIndex(start='2014-01', freq='3M', periods=4)
Out[22]: PeriodIndex(['2014-01', '2014-04', '2014-07', '2014-10'], dtype='int64', freq='3M')
Just like DatetimeIndex
, a PeriodIndex
can also be used to index pandas
objects:
In [23]: ps = pd.Series(np.random.randn(len(prng)), prng)
In [24]: ps
Out[24]:
2011-01 0.469112
2011-02 -0.282863
2011-03 -1.509059
2011-04 -1.135632
...
2011-10 -2.104569
2011-11 -0.494929
2011-12 1.071804
2012-01 0.721555
Freq: M, dtype: float64
PeriodIndex
supports addition and subtraction with the same rule as Period
.
In [25]: idx = pd.period_range('2014-07-01 09:00', periods=5, freq='H')
In [26]: idx
Out[26]:
PeriodIndex(['2014-07-01 09:00', '2014-07-01 10:00', '2014-07-01 11:00',
'2014-07-01 12:00', '2014-07-01 13:00'],
dtype='int64', freq='H')
In [27]: idx + Hour(2)
Out[27]:
PeriodIndex(['2014-07-01 11:00', '2014-07-01 12:00', '2014-07-01 13:00',
'2014-07-01 14:00', '2014-07-01 15:00'],
dtype='int64', freq='H')
In [28]: idx = pd.period_range('2014-07', periods=5, freq='M')
In [29]: idx
Out[29]: PeriodIndex(['2014-07', '2014-08', '2014-09', '2014-10', '2014-11'], dtype='int64', freq='M')
In [30]: idx + MonthEnd(3)
Out[30]: PeriodIndex(['2014-10', '2014-11', '2014-12', '2015-01', '2015-02'], dtype='int64', freq='M')
PeriodIndex
has its own dtype named period
, refer to Period Dtypes.
11.3 Period Dtypes
New in version 0.19.0.
PeriodIndex
has a custom period
dtype. This is a pandas extension
dtype similar to the timezone aware dtype (datetime64[ns, tz]
).
The period
dtype holds the freq
attribute and is represented with
period[freq]
like period[D]
or period[M]
, using frequency strings.
In [31]: pi = pd.period_range('2016-01-01', periods=3, freq='M')
In [32]: pi
Out[32]: PeriodIndex(['2016-01', '2016-02', '2016-03'], dtype='int64', freq='M')
In [33]: pi.dtype
Out[33]: dtype('int64')
The period
dtype can be used in .astype(...)
. It allows one to change the
freq
of a PeriodIndex
like .asfreq()
and convert a
DatetimeIndex
to PeriodIndex
like to_period()
:
# change monthly freq to daily freq
#pi.astype('period[D]') #<-raises TypeError
#TypeError: data type "period[D]" not understood
# convert to DatetimeIndex
#pi.astype('datetime64[ns]')
#ValueError: Cannot cast PeriodIndex to dtype datetime64[ns]
# convert to PeriodIndex
In [34]: dti = pd.date_range('2011-01-01', freq='M', periods=3)
In [35]: dti
Out[35]: DatetimeIndex(['2011-01-31', '2011-02-28', '2011-03-31'], dtype='datetime64[ns]', freq='M')
#dti.astype('period[M]')
#TypeError: data type "period[M]" not understood
11.4 PeriodIndex Partial String Indexing
You can pass in dates and strings to Series
and DataFrame
with PeriodIndex
, in the same manner as DatetimeIndex
. For details, refer to DatetimeIndex Partial String Indexing.
In [36]: ps['2011-01']
Out[36]: 0.46911229990718628
In [37]: ps[datetime(2011, 12, 25):]
Out[37]:
2011-12 1.071804
2012-01 0.721555
Freq: M, dtype: float64
In [38]: ps['10/31/2011':'12/31/2011']
Out[38]:
2011-10 -2.104569
2011-11 -0.494929
2011-12 1.071804
Freq: M, dtype: float64
Passing a string representing a lower frequency than PeriodIndex
returns partial sliced data.
In [39]: ps['2011']
Out[39]:
2011-01 0.469112
2011-02 -0.282863
2011-03 -1.509059
2011-04 -1.135632
...
2011-09 -0.861849
2011-10 -2.104569
2011-11 -0.494929
2011-12 1.071804
Freq: M, dtype: float64
In [40]: dfp = pd.DataFrame(np.random.randn(600,1),
....: columns=['A'],
....: index=pd.period_range('2013-01-01 9:00', periods=600, freq='T'))
....:
In [41]: dfp
Out[41]:
A
2013-01-01 09:00 -0.706771
2013-01-01 09:01 -1.039575
2013-01-01 09:02 0.271860
2013-01-01 09:03 -0.424972
... ...
2013-01-01 18:56 0.578223
2013-01-01 18:57 0.242697
2013-01-01 18:58 0.208129
2013-01-01 18:59 -0.636588
[600 rows x 1 columns]
In [42]: dfp['2013-01-01 10H']
Out[42]:
A
2013-01-01 10:00 -0.078638
2013-01-01 10:01 0.545952
2013-01-01 10:02 -1.219217
2013-01-01 10:03 -1.226825
... ...
2013-01-01 10:56 0.299368
2013-01-01 10:57 -0.863838
2013-01-01 10:58 0.408204
2013-01-01 10:59 -1.048089
[60 rows x 1 columns]
As with DatetimeIndex
, the endpoints will be included in the result. The example below slices data starting from 10:00 to 11:59.
In [43]: dfp['2013-01-01 10H':'2013-01-01 11H']
Out[43]:
A
2013-01-01 10:00 -0.078638
2013-01-01 10:01 0.545952
2013-01-01 10:02 -1.219217
2013-01-01 10:03 -1.226825
... ...
2013-01-01 11:56 -1.461665
2013-01-01 11:57 -1.137707
2013-01-01 11:58 -0.891060
2013-01-01 11:59 -0.693921
[120 rows x 1 columns]
11.5 Frequency Conversion and Resampling with PeriodIndex
The frequency of Period
and PeriodIndex
can be converted via the asfreq
method. Let’s start with the fiscal year 2011, ending in December:
In [44]: p = pd.Period('2011', freq='A-DEC')
In [45]: p
Out[45]: Period('2011', 'A-DEC')
We can convert it to a monthly frequency. Using the how
parameter, we can
specify whether to return the starting or ending month:
In [46]: p.asfreq('M', how='start')
Out[46]: Period('2011-01', 'M')
In [47]: p.asfreq('M', how='end')
Out[47]: Period('2011-12', 'M')
The shorthands ‘s’ and ‘e’ are provided for convenience:
In [48]: p.asfreq('M', 's')
Out[48]: Period('2011-01', 'M')
In [49]: p.asfreq('M', 'e')
Out[49]: Period('2011-12', 'M')
Converting to a “super-period” (e.g., annual frequency is a super-period of quarterly frequency) automatically returns the super-period that includes the input period:
In [50]: p = pd.Period('2011-12', freq='M')
In [51]: p.asfreq('A-NOV')
Out[51]: Period('2012', 'A-NOV')
Note that since we converted to an annual frequency that ends the year in November, the monthly period of December 2011 is actually in the 2012 A-NOV period.
Period conversions with anchored frequencies are particularly useful for
working with various quarterly data common to economics, business, and other
fields. Many organizations define quarters relative to the month in which their
fiscal year starts and ends. Thus, first quarter of 2011 could start in 2010 or
a few months into 2011. Via anchored frequencies, pandas works for all quarterly
frequencies Q-JAN
through Q-DEC
.
Q-DEC
define regular calendar quarters:
In [52]: p = pd.Period('2012Q1', freq='Q-DEC')
In [53]: p.asfreq('D', 's')
Out[53]: Period('2012-01-01', 'D')
In [54]: p.asfreq('D', 'e')
Out[54]: Period('2012-03-31', 'D')
Q-MAR
defines fiscal year end in March:
In [55]: p = pd.Period('2011Q4', freq='Q-MAR')
In [56]: p.asfreq('D', 's')
Out[56]: Period('2011-01-01', 'D')
In [57]: p.asfreq('D', 'e')
Out[57]: Period('2011-03-31', 'D')