4 Converting to Timestamps

To convert a Series or list-like object of date-like objects e.g. strings, epochs, or a mixture, you can use the to_datetime function. When passed a Series, this returns a Series (with the same index), while a list-like is converted to a DatetimeIndex:

In [1]: pd.to_datetime(pd.Series(['Jul 31, 2009', '2010-01-10', None]))
Out[1]: 
0   2009-07-31
1   2010-01-10
2          NaT
dtype: datetime64[ns]

In [2]: pd.to_datetime(['2005/11/23', '2010.12.31'])
Out[2]: DatetimeIndex(['2005-11-23', '2010-12-31'], dtype='datetime64[ns]', freq=None)

If you use dates which start with the day first (i.e. European style), you can pass the dayfirst flag:

In [3]: pd.to_datetime(['04-01-2012 10:00'], dayfirst=True)
Out[3]: DatetimeIndex(['2012-01-04 10:00:00'], dtype='datetime64[ns]', freq=None)

In [4]: pd.to_datetime(['14-01-2012', '01-14-2012'], dayfirst=True)
Out[4]: DatetimeIndex(['2012-01-14', '2012-01-14'], dtype='datetime64[ns]', freq=None)

Warning

You see in the above example that dayfirst isn’t strict, so if a date can’t be parsed with the day being first it will be parsed as if dayfirst were False.

Note

Specifying a format argument will potentially speed up the conversion considerably and on versions later then 0.13.0 explicitly specifying a format string of ‘%Y%m%d’ takes a faster path still.

If you pass a single string to to_datetime, it returns single Timestamp. Also, Timestamp can accept the string input. Note that Timestamp doesn’t accept string parsing option like dayfirst or format, use to_datetime if these are required.

In [5]: pd.to_datetime('2010/11/12')
Out[5]: Timestamp('2010-11-12 00:00:00')

In [6]: pd.Timestamp('2010/11/12')
Out[6]: Timestamp('2010-11-12 00:00:00')

New in version 0.18.1.

You can also pass a DataFrame of integer or string columns to assemble into a Series of Timestamps.

In [7]: df = pd.DataFrame({'year': [2015, 2016],
   ...:                    'month': [2, 3],
   ...:                    'day': [4, 5],
   ...:                    'hour': [2, 3]})
   ...: 

In [8]: pd.to_datetime(df)
Out[8]: 
0   2015-02-04 02:00:00
1   2016-03-05 03:00:00
dtype: datetime64[ns]

You can pass only the columns that you need to assemble.

In [9]: pd.to_datetime(df[['year', 'month', 'day']])
Out[9]: 
0   2015-02-04
1   2016-03-05
dtype: datetime64[ns]

pd.to_datetime looks for standard designations of the datetime component in the column names, including:

  • required: year, month, day
  • optional: hour, minute, second, millisecond, microsecond, nanosecond

4.1 Invalid Data

Note

In version 0.17.0, the default for to_datetime is now errors='raise', rather than errors='ignore'. This means that invalid parsing will raise rather that return the original input as in previous versions.

Pass errors='coerce' to convert invalid data to NaT (not a time):

Raise when unparseable, this is the default

In [2]: pd.to_datetime(['2009/07/31', 'asd'], errors='raise')
ValueError: Unknown string format

Return the original input when unparseable

In [4]: pd.to_datetime(['2009/07/31', 'asd'], errors='ignore')
Out[4]: array(['2009/07/31', 'asd'], dtype=object)

Return NaT for input when unparseable

In [6]: pd.to_datetime(['2009/07/31', 'asd'], errors='coerce')
Out[6]: DatetimeIndex(['2009-07-31', 'NaT'], dtype='datetime64[ns]', freq=None)

4.2 Epoch Timestamps

It’s also possible to convert integer or float epoch times. The default unit for these is nanoseconds (since these are how Timestamp s are stored). However, often epochs are stored in another unit which can be specified:

Typical epoch stored units

In [10]: pd.to_datetime([1349720105, 1349806505, 1349892905,
   ....:                 1349979305, 1350065705], unit='s')
   ....: 
Out[10]: 
DatetimeIndex(['2012-10-08 18:15:05', '2012-10-09 18:15:05',
               '2012-10-10 18:15:05', '2012-10-11 18:15:05',
               '2012-10-12 18:15:05'],
              dtype='datetime64[ns]', freq=None)

In [11]: pd.to_datetime([1349720105100, 1349720105200, 1349720105300,
   ....:                 1349720105400, 1349720105500 ], unit='ms')
   ....: 
Out[11]: 
DatetimeIndex(['2012-10-08 18:15:05.100000', '2012-10-08 18:15:05.200000',
               '2012-10-08 18:15:05.300000', '2012-10-08 18:15:05.400000',
               '2012-10-08 18:15:05.500000'],
              dtype='datetime64[ns]', freq=None)

These work, but the results may be unexpected.

In [12]: pd.to_datetime([1])
Out[12]: DatetimeIndex(['1970-01-01 00:00:00.000000001'], dtype='datetime64[ns]', freq=None)

In [13]: pd.to_datetime([1, 3.14], unit='s')
Out[13]: DatetimeIndex(['1970-01-01 00:00:01', '1970-01-01 00:00:03'], dtype='datetime64[ns]', freq=None)

Note

Epoch times will be rounded to the nearest nanosecond.