.. currentmodule:: pandas .. ipython:: python :suppress: import numpy as np np.random.seed(123456) import pandas as pd pd.options.display.max_rows=15 randn = np.random.randn np.set_printoptions(precision=4, suppress=True) import matplotlib.pyplot as plt plt.close('all') import pandas.util.doctools as doctools p = doctools.TablePlotter() .. _merging.time_series: Timeseries friendly merging --------------------------- .. _merging.merge_ordered: Merging Ordered Data ~~~~~~~~~~~~~~~~~~~~ A :func:`merge_ordered` function allows combining time series and other ordered data. In particular it has an optional ``fill_method`` keyword to fill/interpolate missing data: .. ipython:: python left = pd.DataFrame({'k': ['K0', 'K1', 'K1', 'K2'], 'lv': [1, 2, 3, 4], 's': ['a', 'b', 'c', 'd']}) right = pd.DataFrame({'k': ['K1', 'K2', 'K4'], 'rv': [1, 2, 3]}) pd.merge_ordered(left, right, fill_method='ffill', left_by='s') .. _merging.merge_asof: Merging AsOf ~~~~~~~~~~~~ .. versionadded:: 0.19.0 A :func:`merge_asof` is similar to an ordered left-join except that we match on nearest key rather than equal keys. For each row in the ``left`` DataFrame, we select the last row in the ``right`` DataFrame whose ``on`` key is less than the left's key. Both DataFrames must be sorted by the key. Optionally an asof merge can perform a group-wise merge. This matches the ``by`` key equally, in addition to the nearest match on the ``on`` key. For example; we might have ``trades`` and ``quotes`` and we want to ``asof`` merge them. .. ipython:: python trades = pd.DataFrame({ 'time': pd.to_datetime(['20160525 13:30:00.023', '20160525 13:30:00.038', '20160525 13:30:00.048', '20160525 13:30:00.048', '20160525 13:30:00.048']), 'ticker': ['MSFT', 'MSFT', 'GOOG', 'GOOG', 'AAPL'], 'price': [51.95, 51.95, 720.77, 720.92, 98.00], 'quantity': [75, 155, 100, 100, 100]}, columns=['time', 'ticker', 'price', 'quantity']) quotes = pd.DataFrame({ 'time': pd.to_datetime(['20160525 13:30:00.023', '20160525 13:30:00.023', '20160525 13:30:00.030', '20160525 13:30:00.041', '20160525 13:30:00.048', '20160525 13:30:00.049', '20160525 13:30:00.072', '20160525 13:30:00.075']), 'ticker': ['GOOG', 'MSFT', 'MSFT', 'MSFT', 'GOOG', 'AAPL', 'GOOG', 'MSFT'], 'bid': [720.50, 51.95, 51.97, 51.99, 720.50, 97.99, 720.50, 52.01], 'ask': [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 720.88, 52.03]}, columns=['time', 'ticker', 'bid', 'ask']) .. ipython:: python trades quotes By default we are taking the asof of the quotes. .. ipython:: python pd.merge_asof(trades, quotes, on='time', by='ticker') We only asof within ``2ms`` betwen the quote time and the trade time. .. ipython:: python pd.merge_asof(trades, quotes, on='time', by='ticker', tolerance=pd.Timedelta('2ms')) We only asof within ``10ms`` betwen the quote time and the trade time and we exclude exact matches on time. Note that though we exclude the exact matches (of the quotes), prior quotes DO propogate to that point in time. .. ipython:: python pd.merge_asof(trades, quotes, on='time', by='ticker', tolerance=pd.Timedelta('10ms'), allow_exact_matches=False)