11 Time Span Representation

Regular intervals of time are represented by Period objects in pandas while sequences of Period objects are collected in a PeriodIndex, which can be created with the convenience function period_range.

11.1 Period

A Period represents a span of time (e.g., a day, a month, a quarter, etc). You can specify the span via freq keyword using a frequency alias like below. Because freq represents a span of Period, it cannot be negative like “-3D”.

In [1]: pd.Period('2012', freq='A-DEC')
Out[1]: Period('2012', 'A-DEC')

In [2]: pd.Period('2012-1-1', freq='D')
Out[2]: Period('2012-01-01', 'D')

In [3]: pd.Period('2012-1-1 19:00', freq='H')
Out[3]: Period('2012-01-01 19:00', 'H')

In [4]: pd.Period('2012-1-1 19:00', freq='5H')
Out[4]: Period('2012-01-01 19:00', '5H')

Adding and subtracting integers from periods shifts the period by its own frequency. Arithmetic is not allowed between Period with different freq (span).

In [5]: p = pd.Period('2012', freq='A-DEC')

In [6]: p + 1
Out[6]: Period('2013', 'A-DEC')

In [7]: p - 3
Out[7]: Period('2009', 'A-DEC')

In [8]: p = pd.Period('2012-01', freq='2M')

In [9]: p + 2
Out[9]: Period('2012-05', '2M')

In [10]: p - 1
Out[10]: Period('2011-11', '2M')

In [11]: p == pd.Period('2012-01', freq='3M')
---------------------------------------------------------------------------
IncompatibleFrequency                     Traceback (most recent call last)
<ipython-input-11-ff54ce3238f5> in <module>()
----> 1 p == pd.Period('2012-01', freq='3M')

pandas/src/period.pyx in pandas._period._Period.__richcmp__ (pandas/src/period.c:11376)()

IncompatibleFrequency: Input has different freq=3M from Period(freq=2M)

If Period freq is daily or higher (D, H, T, S, L, U, N), offsets and timedelta-like can be added if the result can have the same freq. Otherwise, ValueError will be raised.

In [12]: p = pd.Period('2014-07-01 09:00', freq='H')

In [13]: p + Hour(2)
Out[13]: Period('2014-07-01 11:00', 'H')

In [14]: p + timedelta(minutes=120)
Out[14]: Period('2014-07-01 11:00', 'H')

In [15]: p + np.timedelta64(7200, 's')
Out[15]: Period('2014-07-01 11:00', 'H')
In [1]: p + Minute(5)
Traceback
   ...
ValueError: Input has different freq from Period(freq=H)

If Period has other freqs, only the same offsets can be added. Otherwise, ValueError will be raised.

In [16]: p = pd.Period('2014-07', freq='M')

In [17]: p + MonthEnd(3)
Out[17]: Period('2014-10', 'M')
In [1]: p + MonthBegin(3)
Traceback
   ...
ValueError: Input has different freq from Period(freq=M)

Taking the difference of Period instances with the same frequency will return the number of frequency units between them:

In [18]: pd.Period('2012', freq='A-DEC') - pd.Period('2002', freq='A-DEC')
Out[18]: 10

11.2 PeriodIndex and period_range

Regular sequences of Period objects can be collected in a PeriodIndex, which can be constructed using the period_range convenience function:

In [19]: prng = pd.period_range('1/1/2011', '1/1/2012', freq='M')

In [20]: prng
Out[20]: 
PeriodIndex(['2011-01', '2011-02', '2011-03', '2011-04', '2011-05', '2011-06',
             '2011-07', '2011-08', '2011-09', '2011-10', '2011-11', '2011-12',
             '2012-01'],
            dtype='int64', freq='M')

The PeriodIndex constructor can also be used directly:

In [21]: pd.PeriodIndex(['2011-1', '2011-2', '2011-3'], freq='M')
Out[21]: PeriodIndex(['2011-01', '2011-02', '2011-03'], dtype='int64', freq='M')

Passing multiplied frequency outputs a sequence of Period which has multiplied span.

In [22]: pd.PeriodIndex(start='2014-01', freq='3M', periods=4)
Out[22]: PeriodIndex(['2014-01', '2014-04', '2014-07', '2014-10'], dtype='int64', freq='3M')

Just like DatetimeIndex, a PeriodIndex can also be used to index pandas objects:

In [23]: ps = pd.Series(np.random.randn(len(prng)), prng)

In [24]: ps
Out[24]: 
2011-01    0.469112
2011-02   -0.282863
2011-03   -1.509059
2011-04   -1.135632
             ...   
2011-10   -2.104569
2011-11   -0.494929
2011-12    1.071804
2012-01    0.721555
Freq: M, dtype: float64

PeriodIndex supports addition and subtraction with the same rule as Period.

In [25]: idx = pd.period_range('2014-07-01 09:00', periods=5, freq='H')

In [26]: idx
Out[26]: 
PeriodIndex(['2014-07-01 09:00', '2014-07-01 10:00', '2014-07-01 11:00',
             '2014-07-01 12:00', '2014-07-01 13:00'],
            dtype='int64', freq='H')

In [27]: idx + Hour(2)
Out[27]: 
PeriodIndex(['2014-07-01 11:00', '2014-07-01 12:00', '2014-07-01 13:00',
             '2014-07-01 14:00', '2014-07-01 15:00'],
            dtype='int64', freq='H')

In [28]: idx = pd.period_range('2014-07', periods=5, freq='M')

In [29]: idx
Out[29]: PeriodIndex(['2014-07', '2014-08', '2014-09', '2014-10', '2014-11'], dtype='int64', freq='M')

In [30]: idx + MonthEnd(3)
Out[30]: PeriodIndex(['2014-10', '2014-11', '2014-12', '2015-01', '2015-02'], dtype='int64', freq='M')

PeriodIndex has its own dtype named period, refer to Period Dtypes.

11.3 Period Dtypes

New in version 0.19.0.

PeriodIndex has a custom period dtype. This is a pandas extension dtype similar to the timezone aware dtype (datetime64[ns, tz]).

The period dtype holds the freq attribute and is represented with period[freq] like period[D] or period[M], using frequency strings.

In [31]: pi = pd.period_range('2016-01-01', periods=3, freq='M')

In [32]: pi
Out[32]: PeriodIndex(['2016-01', '2016-02', '2016-03'], dtype='int64', freq='M')

In [33]: pi.dtype
Out[33]: dtype('int64')

The period dtype can be used in .astype(...). It allows one to change the freq of a PeriodIndex like .asfreq() and convert a DatetimeIndex to PeriodIndex like to_period():

# change monthly freq to daily freq
#pi.astype('period[D]') #<-raises TypeError
#TypeError: data type "period[D]" not understood
# convert to DatetimeIndex
#pi.astype('datetime64[ns]')
#ValueError: Cannot cast PeriodIndex to dtype datetime64[ns]
# convert to PeriodIndex
In [34]: dti = pd.date_range('2011-01-01', freq='M', periods=3)

In [35]: dti
Out[35]: DatetimeIndex(['2011-01-31', '2011-02-28', '2011-03-31'], dtype='datetime64[ns]', freq='M')

#dti.astype('period[M]')
#TypeError: data type "period[M]" not understood

11.4 PeriodIndex Partial String Indexing

You can pass in dates and strings to Series and DataFrame with PeriodIndex, in the same manner as DatetimeIndex. For details, refer to DatetimeIndex Partial String Indexing.

In [36]: ps['2011-01']
Out[36]: 0.46911229990718628

In [37]: ps[datetime(2011, 12, 25):]
Out[37]: 
2011-12    1.071804
2012-01    0.721555
Freq: M, dtype: float64

In [38]: ps['10/31/2011':'12/31/2011']
Out[38]: 
2011-10   -2.104569
2011-11   -0.494929
2011-12    1.071804
Freq: M, dtype: float64

Passing a string representing a lower frequency than PeriodIndex returns partial sliced data.

In [39]: ps['2011']
Out[39]: 
2011-01    0.469112
2011-02   -0.282863
2011-03   -1.509059
2011-04   -1.135632
             ...   
2011-09   -0.861849
2011-10   -2.104569
2011-11   -0.494929
2011-12    1.071804
Freq: M, dtype: float64

In [40]: dfp = pd.DataFrame(np.random.randn(600,1),
   ....:                    columns=['A'],
   ....:                    index=pd.period_range('2013-01-01 9:00', periods=600, freq='T'))
   ....: 

In [41]: dfp
Out[41]: 
                         A
2013-01-01 09:00 -0.706771
2013-01-01 09:01 -1.039575
2013-01-01 09:02  0.271860
2013-01-01 09:03 -0.424972
...                    ...
2013-01-01 18:56  0.578223
2013-01-01 18:57  0.242697
2013-01-01 18:58  0.208129
2013-01-01 18:59 -0.636588

[600 rows x 1 columns]

In [42]: dfp['2013-01-01 10H']
Out[42]: 
                         A
2013-01-01 10:00 -0.078638
2013-01-01 10:01  0.545952
2013-01-01 10:02 -1.219217
2013-01-01 10:03 -1.226825
...                    ...
2013-01-01 10:56  0.299368
2013-01-01 10:57 -0.863838
2013-01-01 10:58  0.408204
2013-01-01 10:59 -1.048089

[60 rows x 1 columns]

As with DatetimeIndex, the endpoints will be included in the result. The example below slices data starting from 10:00 to 11:59.

In [43]: dfp['2013-01-01 10H':'2013-01-01 11H']
Out[43]: 
                         A
2013-01-01 10:00 -0.078638
2013-01-01 10:01  0.545952
2013-01-01 10:02 -1.219217
2013-01-01 10:03 -1.226825
...                    ...
2013-01-01 11:56 -1.461665
2013-01-01 11:57 -1.137707
2013-01-01 11:58 -0.891060
2013-01-01 11:59 -0.693921

[120 rows x 1 columns]

11.5 Frequency Conversion and Resampling with PeriodIndex

The frequency of Period and PeriodIndex can be converted via the asfreq method. Let’s start with the fiscal year 2011, ending in December:

In [44]: p = pd.Period('2011', freq='A-DEC')

In [45]: p
Out[45]: Period('2011', 'A-DEC')

We can convert it to a monthly frequency. Using the how parameter, we can specify whether to return the starting or ending month:

In [46]: p.asfreq('M', how='start')
Out[46]: Period('2011-01', 'M')

In [47]: p.asfreq('M', how='end')
Out[47]: Period('2011-12', 'M')

The shorthands ‘s’ and ‘e’ are provided for convenience:

In [48]: p.asfreq('M', 's')
Out[48]: Period('2011-01', 'M')

In [49]: p.asfreq('M', 'e')
Out[49]: Period('2011-12', 'M')

Converting to a “super-period” (e.g., annual frequency is a super-period of quarterly frequency) automatically returns the super-period that includes the input period:

In [50]: p = pd.Period('2011-12', freq='M')

In [51]: p.asfreq('A-NOV')
Out[51]: Period('2012', 'A-NOV')

Note that since we converted to an annual frequency that ends the year in November, the monthly period of December 2011 is actually in the 2012 A-NOV period.

Period conversions with anchored frequencies are particularly useful for working with various quarterly data common to economics, business, and other fields. Many organizations define quarters relative to the month in which their fiscal year starts and ends. Thus, first quarter of 2011 could start in 2010 or a few months into 2011. Via anchored frequencies, pandas works for all quarterly frequencies Q-JAN through Q-DEC.

Q-DEC define regular calendar quarters:

In [52]: p = pd.Period('2012Q1', freq='Q-DEC')

In [53]: p.asfreq('D', 's')
Out[53]: Period('2012-01-01', 'D')

In [54]: p.asfreq('D', 'e')
Out[54]: Period('2012-03-31', 'D')

Q-MAR defines fiscal year end in March:

In [55]: p = pd.Period('2011Q4', freq='Q-MAR')

In [56]: p.asfreq('D', 's')
Out[56]: Period('2011-01-01', 'D')

In [57]: p.asfreq('D', 'e')
Out[57]: Period('2011-03-31', 'D')