4 Converting to Timestamps
To convert a Series or list-like object of date-like objects e.g. strings,
epochs, or a mixture, you can use the to_datetime
function. When passed
a Series, this returns a Series (with the same index), while a list-like
is converted to a DatetimeIndex:
In [1]: pd.to_datetime(pd.Series(['Jul 31, 2009', '2010-01-10', None]))
Out[1]:
0 2009-07-31
1 2010-01-10
2 NaT
dtype: datetime64[ns]
In [2]: pd.to_datetime(['2005/11/23', '2010.12.31'])
Out[2]: DatetimeIndex(['2005-11-23', '2010-12-31'], dtype='datetime64[ns]', freq=None)
If you use dates which start with the day first (i.e. European style),
you can pass the dayfirst
flag:
In [3]: pd.to_datetime(['04-01-2012 10:00'], dayfirst=True)
Out[3]: DatetimeIndex(['2012-01-04 10:00:00'], dtype='datetime64[ns]', freq=None)
In [4]: pd.to_datetime(['14-01-2012', '01-14-2012'], dayfirst=True)
Out[4]: DatetimeIndex(['2012-01-14', '2012-01-14'], dtype='datetime64[ns]', freq=None)
Warning
You see in the above example that dayfirst
isn’t strict, so if a date
can’t be parsed with the day being first it will be parsed as if
dayfirst
were False.
Note
Specifying a format
argument will potentially speed up the conversion
considerably and on versions later then 0.13.0 explicitly specifying
a format string of ‘%Y%m%d’ takes a faster path still.
If you pass a single string to to_datetime
, it returns single Timestamp
.
Also, Timestamp
can accept the string input.
Note that Timestamp
doesn’t accept string parsing option like dayfirst
or format
, use to_datetime
if these are required.
In [5]: pd.to_datetime('2010/11/12')
Out[5]: Timestamp('2010-11-12 00:00:00')
In [6]: pd.Timestamp('2010/11/12')
Out[6]: Timestamp('2010-11-12 00:00:00')
New in version 0.18.1.
You can also pass a DataFrame
of integer or string columns to assemble into a Series
of Timestamps
.
In [7]: df = pd.DataFrame({'year': [2015, 2016],
...: 'month': [2, 3],
...: 'day': [4, 5],
...: 'hour': [2, 3]})
...:
In [8]: pd.to_datetime(df)
Out[8]:
0 2015-02-04 02:00:00
1 2016-03-05 03:00:00
dtype: datetime64[ns]
You can pass only the columns that you need to assemble.
In [9]: pd.to_datetime(df[['year', 'month', 'day']])
Out[9]:
0 2015-02-04
1 2016-03-05
dtype: datetime64[ns]
pd.to_datetime
looks for standard designations of the datetime component in the column names, including:
- required:
year
,month
,day
- optional:
hour
,minute
,second
,millisecond
,microsecond
,nanosecond
4.1 Invalid Data
Note
In version 0.17.0, the default for to_datetime
is now errors='raise'
, rather than errors='ignore'
. This means
that invalid parsing will raise rather that return the original input as in previous versions.
Pass errors='coerce'
to convert invalid data to NaT
(not a time):
Raise when unparseable, this is the default
In [2]: pd.to_datetime(['2009/07/31', 'asd'], errors='raise')
ValueError: Unknown string format
Return the original input when unparseable
In [4]: pd.to_datetime(['2009/07/31', 'asd'], errors='ignore')
Out[4]: array(['2009/07/31', 'asd'], dtype=object)
Return NaT for input when unparseable
In [6]: pd.to_datetime(['2009/07/31', 'asd'], errors='coerce')
Out[6]: DatetimeIndex(['2009-07-31', 'NaT'], dtype='datetime64[ns]', freq=None)
4.2 Epoch Timestamps
It’s also possible to convert integer or float epoch times. The default unit
for these is nanoseconds (since these are how Timestamp
s are stored). However,
often epochs are stored in another unit
which can be specified:
Typical epoch stored units
In [10]: pd.to_datetime([1349720105, 1349806505, 1349892905,
....: 1349979305, 1350065705], unit='s')
....:
Out[10]:
DatetimeIndex(['2012-10-08 18:15:05', '2012-10-09 18:15:05',
'2012-10-10 18:15:05', '2012-10-11 18:15:05',
'2012-10-12 18:15:05'],
dtype='datetime64[ns]', freq=None)
In [11]: pd.to_datetime([1349720105100, 1349720105200, 1349720105300,
....: 1349720105400, 1349720105500 ], unit='ms')
....:
Out[11]:
DatetimeIndex(['2012-10-08 18:15:05.100000', '2012-10-08 18:15:05.200000',
'2012-10-08 18:15:05.300000', '2012-10-08 18:15:05.400000',
'2012-10-08 18:15:05.500000'],
dtype='datetime64[ns]', freq=None)
These work, but the results may be unexpected.
In [12]: pd.to_datetime([1])
Out[12]: DatetimeIndex(['1970-01-01 00:00:00.000000001'], dtype='datetime64[ns]', freq=None)
In [13]: pd.to_datetime([1, 3.14], unit='s')
Out[13]: DatetimeIndex(['1970-01-01 00:00:01', '1970-01-01 00:00:03'], dtype='datetime64[ns]', freq=None)
Note
Epoch times will be rounded to the nearest nanosecond.