6 Merge
6.1 Concat
pandas provides various facilities for easily combining together Series, DataFrame, and Panel objects with various kinds of set logic for the indexes and relational algebra functionality in the case of join / merge-type operations.
See the Merging section
Concatenating pandas objects together with concat()
:
In [1]: df = pd.DataFrame(np.random.randn(10, 4))
In [2]: df
Out[2]:
0 1 2 3
0 0.469112 -0.282863 -1.509059 -1.135632
1 1.212112 -0.173215 0.119209 -1.044236
2 -0.861849 -2.104569 -0.494929 1.071804
3 0.721555 -0.706771 -1.039575 0.271860
.. ... ... ... ...
6 0.404705 0.577046 -1.715002 -1.039268
7 -0.370647 -1.157892 -1.344312 0.844885
8 1.075770 -0.109050 1.643563 -1.469388
9 0.357021 -0.674600 -1.776904 -0.968914
[10 rows x 4 columns]
# break it into pieces
In [3]: pieces = [df[:3], df[3:7], df[7:]]
In [4]: pd.concat(pieces)
Out[4]:
0 1 2 3
0 0.469112 -0.282863 -1.509059 -1.135632
1 1.212112 -0.173215 0.119209 -1.044236
2 -0.861849 -2.104569 -0.494929 1.071804
3 0.721555 -0.706771 -1.039575 0.271860
.. ... ... ... ...
6 0.404705 0.577046 -1.715002 -1.039268
7 -0.370647 -1.157892 -1.344312 0.844885
8 1.075770 -0.109050 1.643563 -1.469388
9 0.357021 -0.674600 -1.776904 -0.968914
[10 rows x 4 columns]
6.2 Join
SQL style merges. See the Database style joining
In [5]: left = pd.DataFrame({'key': ['foo', 'foo'], 'lval': [1, 2]})
In [6]: right = pd.DataFrame({'key': ['foo', 'foo'], 'rval': [4, 5]})
In [7]: left
Out[7]:
key lval
0 foo 1
1 foo 2
In [8]: right
Out[8]:
key rval
0 foo 4
1 foo 5
In [9]: pd.merge(left, right, on='key')
Out[9]:
key lval rval
0 foo 1 4
1 foo 1 5
2 foo 2 4
3 foo 2 5
Another example that can be given is:
In [10]: left = pd.DataFrame({'key': ['foo', 'bar'], 'lval': [1, 2]})
In [11]: right = pd.DataFrame({'key': ['foo', 'bar'], 'rval': [4, 5]})
In [12]: left
Out[12]:
key lval
0 foo 1
1 bar 2
In [13]: right
Out[13]:
key rval
0 foo 4
1 bar 5
In [14]: pd.merge(left, right, on='key')
Out[14]:
key lval rval
0 foo 1 4
1 bar 2 5
6.3 Append
Append rows to a dataframe. See the Appending
In [15]: df = pd.DataFrame(np.random.randn(8, 4), columns=['A','B','C','D'])
In [16]: df
Out[16]:
A B C D
0 -1.294524 0.413738 0.276662 -0.472035
1 -0.013960 -0.362543 -0.006154 -0.923061
2 0.895717 0.805244 -1.206412 2.565646
3 1.431256 1.340309 -1.170299 -0.226169
4 0.410835 0.813850 0.132003 -0.827317
5 -0.076467 -1.187678 1.130127 -1.436737
6 -1.413681 1.607920 1.024180 0.569605
7 0.875906 -2.211372 0.974466 -2.006747
In [17]: s = df.iloc[3]
In [18]: df.append(s, ignore_index=True)
Out[18]:
A B C D
0 -1.294524 0.413738 0.276662 -0.472035
1 -0.013960 -0.362543 -0.006154 -0.923061
2 0.895717 0.805244 -1.206412 2.565646
3 1.431256 1.340309 -1.170299 -0.226169
.. ... ... ... ...
5 -0.076467 -1.187678 1.130127 -1.436737
6 -1.413681 1.607920 1.024180 0.569605
7 0.875906 -2.211372 0.974466 -2.006747
8 1.431256 1.340309 -1.170299 -0.226169
[9 rows x 4 columns]