2.7 Merge

The Concat docs. The Join docs.

Append two dataframes with overlapping index (emulate R rbind)

In [1]: rng = pd.date_range('2000-01-01', periods=6)

In [2]: df1 = pd.DataFrame(np.random.randn(6, 3), index=rng, columns=['A', 'B', 'C'])

In [3]: df2 = df1.copy()

ignore_index is needed in pandas < v0.13, and depending on df construction

In [4]: df = df1.append(df2,ignore_index=True); df
Out[4]: 
           A         B         C
0   0.469112 -0.282863 -1.509059
1  -1.135632  1.212112 -0.173215
2   0.119209 -1.044236 -0.861849
3  -2.104569 -0.494929  1.071804
..       ...       ...       ...
8   0.119209 -1.044236 -0.861849
9  -2.104569 -0.494929  1.071804
10  0.721555 -0.706771 -1.039575
11  0.271860 -0.424972  0.567020

[12 rows x 3 columns]

Self Join of a DataFrame

In [5]: df = pd.DataFrame(data={'Area' : ['A'] * 5 + ['C'] * 2,
   ...:                         'Bins' : [110] * 2 + [160] * 3 + [40] * 2,
   ...:                         'Test_0' : [0, 1, 0, 1, 2, 0, 1],
   ...:                         'Data' : np.random.randn(7)});df
   ...: 
Out[5]: 
  Area  Bins      Data  Test_0
0    A   110  0.276232       0
1    A   110 -1.087401       1
2    A   160 -0.673690       0
3    A   160  0.113648       1
4    A   160 -1.478427       2
5    C    40  0.524988       0
6    C    40  0.404705       1

In [6]: df['Test_1'] = df['Test_0'] - 1

In [7]: pd.merge(df, df, left_on=['Bins', 'Area','Test_0'], right_on=['Bins', 'Area','Test_1'],suffixes=('_L','_R'))
Out[7]: 
  Area  Bins    Data_L  Test_0_L  Test_1_L    Data_R  Test_0_R  Test_1_R
0    A   110  0.276232         0        -1 -1.087401         1         0
1    A   160 -0.673690         0        -1  0.113648         1         0
2    A   160  0.113648         1         0 -1.478427         2         1
3    C    40  0.524988         0        -1  0.404705         1         0

How to set the index and join

KDB like asof join

Join with a criteria based on the values

Using searchsorted to merge based on values inside a range