pandas.merge_asof

pandas.merge_asof(left, right, on=None, left_on=None, right_on=None, by=None, suffixes=('_x', '_y'), tolerance=None, allow_exact_matches=True, check_duplicates=True)[source]

Perform an asof merge. This is similar to a left-join except that we match on nearest key rather than equal keys.

For each row in the left DataFrame, we select the last row in the right DataFrame whose ‘on’ key is less than or equal to the left’s key. Both DataFrames must be sorted by the key.

Optionally perform group-wise merge. This searches for the nearest match on the ‘on’ key within the same group according to ‘by’.

New in version 0.19.0.

Parameters:

left : DataFrame

right : DataFrame

on : label or list

Field names to join on. Must be found in both DataFrames. The data MUST be ordered. Furthermore this must be a numeric column, typically a datetimelike or integer. On or left_on/right_on must be given.

left_on : label or list, or array-like

Field names to join on in left DataFrame. Can be a vector or list of vectors of the length of the DataFrame to use a particular vector as the join key instead of columns

right_on : label or list, or array-like

Field names to join on in right DataFrame or vector/list of vectors per left_on docs

by : column name or list of column names

Group both the left and right DataFrames by the group columns; perform the merge operation on these pieces and recombine.

suffixes : 2-length sequence (tuple, list, ...)

Suffix to apply to overlapping column names in the left and right side, respectively

tolerance : integer or Timedelta, optional, default None

select asof tolerance within this range; must be compatible to the merge index.

allow_exact_matches : boolean, default True

  • If True, allow matching the same ‘on’ value (i.e. less-than-or-equal-to)
  • If False, don’t match the same ‘on’ value (i.e., stricly less-than)

check_duplicates : boolean, default True

  • If True, check and remove duplicates for the right DataFrame, on the [by, on] combination, keeping the last value.
  • If False, no check for duplicates. If you know that you don’t have duplicates, then turning off the check for duplicates can be more performant.
Returns:

merged : DataFrame

See also

merge, merge_ordered

Examples

>>> left
    a left_val
0   1        a
1   5        b
2  10        c
>>> right
   a  right_val
0  1          1
1  2          2
2  3          3
3  6          6
4  7          7
>>> pd.merge_asof(left, right, on='a')
    a left_val  right_val
0   1        a          1
1   5        b          3
2  10        c          7
>>> pd.merge_asof(left, right, on='a', allow_exact_matches=False)
    a left_val  right_val
0   1        a        NaN
1   5        b        3.0
2  10        c        7.0

For this example, we can achieve a similar result thru pd.merge_ordered(), though its not nearly as performant.

>>> (pd.merge_ordered(left, right, on='a')
...    .ffill()
...    .drop_duplicates(['left_val'])
... )
    a left_val  right_val
0   1        a        1.0
3   5        b        3.0
6  10        c        7.0

Here is a real-world times-series example

>>> quotes
                     time ticker     bid     ask
0 2016-05-25 13:30:00.023   GOOG  720.50  720.93
1 2016-05-25 13:30:00.023   MSFT   51.95   51.96
2 2016-05-25 13:30:00.030   MSFT   51.97   51.98
3 2016-05-25 13:30:00.041   MSFT   51.99   52.00
4 2016-05-25 13:30:00.048   GOOG  720.50  720.93
5 2016-05-25 13:30:00.049   AAPL   97.99   98.01
6 2016-05-25 13:30:00.072   GOOG  720.50  720.88
7 2016-05-25 13:30:00.075   MSFT   52.01   52.03
>>> trades
                     time ticker   price  quantity
0 2016-05-25 13:30:00.023   MSFT   51.95        75
1 2016-05-25 13:30:00.038   MSFT   51.95       155
2 2016-05-25 13:30:00.048   GOOG  720.77       100
3 2016-05-25 13:30:00.048   GOOG  720.92       100
4 2016-05-25 13:30:00.048   AAPL   98.00       100

By default we are taking the asof of the quotes

>>> pd.asof_merge(trades, quotes,
...                       on='time',
...                       by='ticker')
                     time ticker   price  quantity     bid     ask
0 2016-05-25 13:30:00.023   MSFT   51.95        75   51.95   51.96
1 2016-05-25 13:30:00.038   MSFT   51.95       155   51.97   51.98
2 2016-05-25 13:30:00.048   GOOG  720.77       100  720.50  720.93
3 2016-05-25 13:30:00.048   GOOG  720.92       100  720.50  720.93
4 2016-05-25 13:30:00.048   AAPL   98.00       100     NaN     NaN

We only asof within 2ms betwen the quote time and the trade time

>>> pd.asof_merge(trades, quotes,
...                       on='time',
...                       by='ticker',
...                       tolerance=pd.Timedelta('2ms'))
                     time ticker   price  quantity     bid     ask
0 2016-05-25 13:30:00.023   MSFT   51.95        75   51.95   51.96
1 2016-05-25 13:30:00.038   MSFT   51.95       155     NaN     NaN
2 2016-05-25 13:30:00.048   GOOG  720.77       100  720.50  720.93
3 2016-05-25 13:30:00.048   GOOG  720.92       100  720.50  720.93
4 2016-05-25 13:30:00.048   AAPL   98.00       100     NaN     NaN

We only asof within 10ms betwen the quote time and the trade time and we exclude exact matches on time. However prior data will propogate forward

>>> pd.asof_merge(trades, quotes,
...                       on='time',
...                       by='ticker',
...                       tolerance=pd.Timedelta('10ms'),
...                       allow_exact_matches=False)
                     time ticker   price  quantity     bid     ask
0 2016-05-25 13:30:00.023   MSFT   51.95        75     NaN     NaN
1 2016-05-25 13:30:00.038   MSFT   51.95       155   51.97   51.98
2 2016-05-25 13:30:00.048   GOOG  720.77       100  720.50  720.93
3 2016-05-25 13:30:00.048   GOOG  720.92       100  720.50  720.93
4 2016-05-25 13:30:00.048   AAPL   98.00       100     NaN     NaN