14 Time Zone Handling
Pandas provides rich support for working with timestamps in different time zones using pytz
and dateutil
libraries.
dateutil
support is new in 0.14.1 and currently only supported for fixed offset and tzfile zones. The default library is pytz
.
Support for dateutil
is provided for compatibility with other applications e.g. if you use dateutil
in other python packages.
14.1 Working with Time Zones
By default, pandas objects are time zone unaware:
In [1]: rng = pd.date_range('3/6/2012 00:00', periods=15, freq='D')
In [2]: rng.tz is None
Out[2]: True
To supply the time zone, you can use the tz
keyword to date_range
and
other functions. Dateutil time zone strings are distinguished from pytz
time zones by starting with dateutil/
.
- In
pytz
you can find a list of common (and less common) time zones usingfrom pytz import common_timezones, all_timezones
. dateutil
uses the OS timezones so there isn’t a fixed list available. For common zones, the names are the same aspytz
.
# pytz
In [3]: rng_pytz = pd.date_range('3/6/2012 00:00', periods=10, freq='D',
...: tz='Europe/London')
...:
In [4]: rng_pytz.tz
Out[4]: <DstTzInfo 'Europe/London' LMT-1 day, 23:59:00 STD>
# dateutil
In [5]: rng_dateutil = pd.date_range('3/6/2012 00:00', periods=10, freq='D',
...: tz='dateutil/Europe/London')
...:
In [6]: rng_dateutil.tz
Out[6]: tzfile('/usr/share/zoneinfo/Europe/London')
# dateutil - utc special case
In [7]: rng_utc = pd.date_range('3/6/2012 00:00', periods=10, freq='D',
...: tz=dateutil.tz.tzutc())
...:
In [8]: rng_utc.tz
Out[8]: tzutc()
Note that the UTC
timezone is a special case in dateutil
and should be constructed explicitly
as an instance of dateutil.tz.tzutc
. You can also construct other timezones explicitly first,
which gives you more control over which time zone is used:
# pytz
In [9]: tz_pytz = pytz.timezone('Europe/London')
In [10]: rng_pytz = pd.date_range('3/6/2012 00:00', periods=10, freq='D',
....: tz=tz_pytz)
....:
In [11]: rng_pytz.tz == tz_pytz
Out[11]: True
# dateutil
In [12]: tz_dateutil = dateutil.tz.gettz('Europe/London')
In [13]: rng_dateutil = pd.date_range('3/6/2012 00:00', periods=10, freq='D',
....: tz=tz_dateutil)
....:
In [14]: rng_dateutil.tz == tz_dateutil
Out[14]: True
Timestamps, like Python’s datetime.datetime
object can be either time zone
naive or time zone aware. Naive time series and DatetimeIndex objects can be
localized using tz_localize
:
In [15]: ts = pd.Series(np.random.randn(len(rng)), rng)
In [16]: ts_utc = ts.tz_localize('UTC')
In [17]: ts_utc
Out[17]:
2012-03-06 00:00:00+00:00 0.469112
2012-03-07 00:00:00+00:00 -0.282863
2012-03-08 00:00:00+00:00 -1.509059
2012-03-09 00:00:00+00:00 -1.135632
...
2012-03-17 00:00:00+00:00 1.071804
2012-03-18 00:00:00+00:00 0.721555
2012-03-19 00:00:00+00:00 -0.706771
2012-03-20 00:00:00+00:00 -1.039575
Freq: D, dtype: float64
Again, you can explicitly construct the timezone object first.
You can use the tz_convert
method to convert pandas objects to convert
tz-aware data to another time zone:
In [18]: ts_utc.tz_convert('US/Eastern')
Out[18]:
2012-03-05 19:00:00-05:00 0.469112
2012-03-06 19:00:00-05:00 -0.282863
2012-03-07 19:00:00-05:00 -1.509059
2012-03-08 19:00:00-05:00 -1.135632
...
2012-03-16 20:00:00-04:00 1.071804
2012-03-17 20:00:00-04:00 0.721555
2012-03-18 20:00:00-04:00 -0.706771
2012-03-19 20:00:00-04:00 -1.039575
Freq: D, dtype: float64
Warning
Be wary of conversions between libraries. For some zones pytz
and dateutil
have different
definitions of the zone. This is more of a problem for unusual timezones than for
‘standard’ zones like US/Eastern
.
Warning
Be aware that a timezone definition across versions of timezone libraries may not be considered equal. This may cause problems when working with stored data that is localized using one version and operated on with a different version. See here for how to handle such a situation.
Warning
It is incorrect to pass a timezone directly into the datetime.datetime
constructor (e.g.,
datetime.datetime(2011, 1, 1, tz=timezone('US/Eastern'))
. Instead, the datetime
needs to be localized using the the localize method on the timezone.
Under the hood, all timestamps are stored in UTC. Scalar values from a
DatetimeIndex
with a time zone will have their fields (day, hour, minute)
localized to the time zone. However, timestamps with the same UTC value are
still considered to be equal even if they are in different time zones:
In [19]: rng_eastern = rng_utc.tz_convert('US/Eastern')
In [20]: rng_berlin = rng_utc.tz_convert('Europe/Berlin')
In [21]: rng_eastern[5]
Out[21]: Timestamp('2012-03-10 19:00:00-0500', tz='US/Eastern', freq='D')
In [22]: rng_berlin[5]
Out[22]: Timestamp('2012-03-11 01:00:00+0100', tz='Europe/Berlin', freq='D')
In [23]: rng_eastern[5] == rng_berlin[5]
Out[23]: True
Like Series
, DataFrame
, and DatetimeIndex
, Timestamp``s can be converted to other
time zones using ``tz_convert
:
In [24]: rng_eastern[5]
Out[24]: Timestamp('2012-03-10 19:00:00-0500', tz='US/Eastern', freq='D')
In [25]: rng_berlin[5]
Out[25]: Timestamp('2012-03-11 01:00:00+0100', tz='Europe/Berlin', freq='D')
In [26]: rng_eastern[5].tz_convert('Europe/Berlin')
Out[26]: Timestamp('2012-03-11 01:00:00+0100', tz='Europe/Berlin')
Localization of Timestamp
functions just like DatetimeIndex
and Series
:
In [27]: rng[5]
Out[27]: Timestamp('2012-03-11 00:00:00', freq='D')
In [28]: rng[5].tz_localize('Asia/Shanghai')
Out[28]: Timestamp('2012-03-11 00:00:00+0800', tz='Asia/Shanghai')
Operations between Series in different time zones will yield UTC Series, aligning the data on the UTC timestamps:
In [29]: eastern = ts_utc.tz_convert('US/Eastern')
In [30]: berlin = ts_utc.tz_convert('Europe/Berlin')
In [31]: result = eastern + berlin
In [32]: result
Out[32]:
2012-03-06 00:00:00+00:00 0.938225
2012-03-07 00:00:00+00:00 -0.565727
2012-03-08 00:00:00+00:00 -3.018117
2012-03-09 00:00:00+00:00 -2.271265
...
2012-03-17 00:00:00+00:00 2.143608
2012-03-18 00:00:00+00:00 1.443110
2012-03-19 00:00:00+00:00 -1.413542
2012-03-20 00:00:00+00:00 -2.079150
Freq: D, dtype: float64
In [33]: result.index
Out[33]:
DatetimeIndex(['2012-03-06', '2012-03-07', '2012-03-08', '2012-03-09',
'2012-03-10', '2012-03-11', '2012-03-12', '2012-03-13',
'2012-03-14', '2012-03-15', '2012-03-16', '2012-03-17',
'2012-03-18', '2012-03-19', '2012-03-20'],
dtype='datetime64[ns, UTC]', freq='D')
To remove timezone from tz-aware DatetimeIndex
, use tz_localize(None)
or tz_convert(None)
.
tz_localize(None)
will remove timezone holding local time representations.
tz_convert(None)
will remove timezone after converting to UTC time.
In [34]: didx = pd.DatetimeIndex(start='2014-08-01 09:00', freq='H', periods=10, tz='US/Eastern')
In [35]: didx
Out[35]:
DatetimeIndex(['2014-08-01 09:00:00-04:00', '2014-08-01 10:00:00-04:00',
'2014-08-01 11:00:00-04:00', '2014-08-01 12:00:00-04:00',
'2014-08-01 13:00:00-04:00', '2014-08-01 14:00:00-04:00',
'2014-08-01 15:00:00-04:00', '2014-08-01 16:00:00-04:00',
'2014-08-01 17:00:00-04:00', '2014-08-01 18:00:00-04:00'],
dtype='datetime64[ns, US/Eastern]', freq='H')
In [36]: didx.tz_localize(None)
Out[36]:
DatetimeIndex(['2014-08-01 09:00:00', '2014-08-01 10:00:00',
'2014-08-01 11:00:00', '2014-08-01 12:00:00',
'2014-08-01 13:00:00', '2014-08-01 14:00:00',
'2014-08-01 15:00:00', '2014-08-01 16:00:00',
'2014-08-01 17:00:00', '2014-08-01 18:00:00'],
dtype='datetime64[ns]', freq='H')
In [37]: didx.tz_convert(None)
Out[37]:
DatetimeIndex(['2014-08-01 13:00:00', '2014-08-01 14:00:00',
'2014-08-01 15:00:00', '2014-08-01 16:00:00',
'2014-08-01 17:00:00', '2014-08-01 18:00:00',
'2014-08-01 19:00:00', '2014-08-01 20:00:00',
'2014-08-01 21:00:00', '2014-08-01 22:00:00'],
dtype='datetime64[ns]', freq='H')
# tz_convert(None) is identical with tz_convert('UTC').tz_localize(None)
In [38]: didx.tz_convert('UCT').tz_localize(None)
Out[38]:
DatetimeIndex(['2014-08-01 13:00:00', '2014-08-01 14:00:00',
'2014-08-01 15:00:00', '2014-08-01 16:00:00',
'2014-08-01 17:00:00', '2014-08-01 18:00:00',
'2014-08-01 19:00:00', '2014-08-01 20:00:00',
'2014-08-01 21:00:00', '2014-08-01 22:00:00'],
dtype='datetime64[ns]', freq='H')
14.2 Ambiguous Times when Localizing
In some cases, localize cannot determine the DST and non-DST hours when there are
duplicates. This often happens when reading files or database records that simply
duplicate the hours. Passing ambiguous='infer'
(infer_dst
argument in prior
releases) into tz_localize
will attempt to determine the right offset. Below
the top example will fail as it contains ambiguous times and the bottom will
infer the right offset.
In [39]: rng_hourly = pd.DatetimeIndex(['11/06/2011 00:00', '11/06/2011 01:00',
....: '11/06/2011 01:00', '11/06/2011 02:00',
....: '11/06/2011 03:00'])
....:
This will fail as there are ambiguous times
In [2]: rng_hourly.tz_localize('US/Eastern')
AmbiguousTimeError: Cannot infer dst time from Timestamp('2011-11-06 01:00:00'), try using the 'ambiguous' argument
Infer the ambiguous times
In [40]: rng_hourly_eastern = rng_hourly.tz_localize('US/Eastern', ambiguous='infer')
In [41]: rng_hourly_eastern.tolist()
Out[41]:
[Timestamp('2011-11-06 00:00:00-0400', tz='US/Eastern'),
Timestamp('2011-11-06 01:00:00-0400', tz='US/Eastern'),
Timestamp('2011-11-06 01:00:00-0500', tz='US/Eastern'),
Timestamp('2011-11-06 02:00:00-0500', tz='US/Eastern'),
Timestamp('2011-11-06 03:00:00-0500', tz='US/Eastern')]
In addition to ‘infer’, there are several other arguments supported. Passing
an array-like of bools or 0s/1s where True represents a DST hour and False a
non-DST hour, allows for distinguishing more than one DST
transition (e.g., if you have multiple records in a database each with their
own DST transition). Or passing ‘NaT’ will fill in transition times
with not-a-time values. These methods are available in the DatetimeIndex
constructor as well as tz_localize
.
In [42]: rng_hourly_dst = np.array([1, 1, 0, 0, 0])
In [43]: rng_hourly.tz_localize('US/Eastern', ambiguous=rng_hourly_dst).tolist()
Out[43]:
[Timestamp('2011-11-06 00:00:00-0400', tz='US/Eastern'),
Timestamp('2011-11-06 01:00:00-0400', tz='US/Eastern'),
Timestamp('2011-11-06 01:00:00-0500', tz='US/Eastern'),
Timestamp('2011-11-06 02:00:00-0500', tz='US/Eastern'),
Timestamp('2011-11-06 03:00:00-0500', tz='US/Eastern')]
In [44]: rng_hourly.tz_localize('US/Eastern', ambiguous='NaT').tolist()
Out[44]:
[Timestamp('2011-11-06 00:00:00-0400', tz='US/Eastern'),
NaT,
NaT,
Timestamp('2011-11-06 02:00:00-0500', tz='US/Eastern'),
Timestamp('2011-11-06 03:00:00-0500', tz='US/Eastern')]
In [45]: didx = pd.DatetimeIndex(start='2014-08-01 09:00', freq='H', periods=10, tz='US/Eastern')
In [46]: didx
Out[46]:
DatetimeIndex(['2014-08-01 09:00:00-04:00', '2014-08-01 10:00:00-04:00',
'2014-08-01 11:00:00-04:00', '2014-08-01 12:00:00-04:00',
'2014-08-01 13:00:00-04:00', '2014-08-01 14:00:00-04:00',
'2014-08-01 15:00:00-04:00', '2014-08-01 16:00:00-04:00',
'2014-08-01 17:00:00-04:00', '2014-08-01 18:00:00-04:00'],
dtype='datetime64[ns, US/Eastern]', freq='H')
In [47]: didx.tz_localize(None)
Out[47]:
DatetimeIndex(['2014-08-01 09:00:00', '2014-08-01 10:00:00',
'2014-08-01 11:00:00', '2014-08-01 12:00:00',
'2014-08-01 13:00:00', '2014-08-01 14:00:00',
'2014-08-01 15:00:00', '2014-08-01 16:00:00',
'2014-08-01 17:00:00', '2014-08-01 18:00:00'],
dtype='datetime64[ns]', freq='H')
In [48]: didx.tz_convert(None)
Out[48]:
DatetimeIndex(['2014-08-01 13:00:00', '2014-08-01 14:00:00',
'2014-08-01 15:00:00', '2014-08-01 16:00:00',
'2014-08-01 17:00:00', '2014-08-01 18:00:00',
'2014-08-01 19:00:00', '2014-08-01 20:00:00',
'2014-08-01 21:00:00', '2014-08-01 22:00:00'],
dtype='datetime64[ns]', freq='H')
# tz_convert(None) is identical with tz_convert('UTC').tz_localize(None)
In [49]: didx.tz_convert('UCT').tz_localize(None)
Out[49]:
DatetimeIndex(['2014-08-01 13:00:00', '2014-08-01 14:00:00',
'2014-08-01 15:00:00', '2014-08-01 16:00:00',
'2014-08-01 17:00:00', '2014-08-01 18:00:00',
'2014-08-01 19:00:00', '2014-08-01 20:00:00',
'2014-08-01 21:00:00', '2014-08-01 22:00:00'],
dtype='datetime64[ns]', freq='H')
14.3 TZ aware Dtypes
New in version 0.17.0.
Series/DatetimeIndex
with a timezone naive value are represented with a dtype of datetime64[ns]
.
In [50]: s_naive = pd.Series(pd.date_range('20130101',periods=3))
In [51]: s_naive
Out[51]:
0 2013-01-01
1 2013-01-02
2 2013-01-03
dtype: datetime64[ns]
Series/DatetimeIndex
with a timezone aware value are represented with a dtype of datetime64[ns, tz]
.
In [52]: s_aware = pd.Series(pd.date_range('20130101',periods=3,tz='US/Eastern'))
In [53]: s_aware
Out[53]:
0 2013-01-01 00:00:00-05:00
1 2013-01-02 00:00:00-05:00
2 2013-01-03 00:00:00-05:00
dtype: datetime64[ns, US/Eastern]
Both of these Series
can be manipulated via the .dt
accessor, see here.
For example, to localize and convert a naive stamp to timezone aware.
In [54]: s_naive.dt.tz_localize('UTC').dt.tz_convert('US/Eastern')
Out[54]:
0 2012-12-31 19:00:00-05:00
1 2013-01-01 19:00:00-05:00
2 2013-01-02 19:00:00-05:00
dtype: datetime64[ns, US/Eastern]
Further more you can .astype(...)
timezone aware (and naive). This operation is effectively a localize AND convert on a naive stamp, and
a convert on an aware stamp.
# localize and convert a naive timezone
In [55]: s_naive.astype('datetime64[ns, US/Eastern]')
Out[55]:
0 2012-12-31 19:00:00-05:00
1 2013-01-01 19:00:00-05:00
2 2013-01-02 19:00:00-05:00
dtype: datetime64[ns, US/Eastern]
# make an aware tz naive
In [56]: s_aware.astype('datetime64[ns]')
Out[56]:
0 2013-01-01 05:00:00
1 2013-01-02 05:00:00
2 2013-01-03 05:00:00
dtype: datetime64[ns]
# convert to a new timezone
In [57]: s_aware.astype('datetime64[ns, CET]')
Out[57]:
0 2013-01-01 06:00:00+01:00
1 2013-01-02 06:00:00+01:00
2 2013-01-03 06:00:00+01:00
dtype: datetime64[ns, CET]
Note
Using the .values
accessor on a Series
, returns an numpy array of the data.
These values are converted to UTC, as numpy does not currently support timezones (even though it is printing in the local timezone!).
In [58]: s_naive.values
Out[58]:
array(['2013-01-01T00:00:00.000000000', '2013-01-02T00:00:00.000000000',
'2013-01-03T00:00:00.000000000'], dtype='datetime64[ns]')
In [59]: s_aware.values
Out[59]:
array(['2013-01-01T05:00:00.000000000', '2013-01-02T05:00:00.000000000',
'2013-01-03T05:00:00.000000000'], dtype='datetime64[ns]')
Further note that once converted to a numpy array these would lose the tz tenor.
In [60]: pd.Series(s_aware.values)
Out[60]:
0 2013-01-01 05:00:00
1 2013-01-02 05:00:00
2 2013-01-03 05:00:00
dtype: datetime64[ns]
However, these can be easily converted
In [61]: pd.Series(s_aware.values).dt.tz_localize('UTC').dt.tz_convert('US/Eastern')
Out[61]:
0 2013-01-01 00:00:00-05:00
1 2013-01-02 00:00:00-05:00
2 2013-01-03 00:00:00-05:00
dtype: datetime64[ns, US/Eastern]