5.5 UNION
UNION ALL can be performed using concat()
.
In [1]: df1 = pd.DataFrame({'city': ['Chicago', 'San Francisco', 'New York City'],
...: 'rank': range(1, 4)})
...:
In [2]: df2 = pd.DataFrame({'city': ['Chicago', 'Boston', 'Los Angeles'],
...: 'rank': [1, 4, 5]})
...:
In [3]: df1
Out[3]:
city rank
0 Chicago 1
1 San Francisco 2
2 New York City 3
In [4]: df2
Out[4]:
city rank
0 Chicago 1
1 Boston 4
2 Los Angeles 5
SELECT city, rank
FROM df1
UNION ALL
SELECT city, rank
FROM df2;
/*
city rank
Chicago 1
San Francisco 2
New York City 3
Chicago 1
Boston 4
Los Angeles 5
*/
In [5]: pd.concat([df1, df2])
Out[5]:
city rank
0 Chicago 1
1 San Francisco 2
2 New York City 3
0 Chicago 1
1 Boston 4
2 Los Angeles 5
SQL’s UNION is similar to UNION ALL, however UNION will remove duplicate rows.
SELECT city, rank
FROM df1
UNION
SELECT city, rank
FROM df2;
-- notice that there is only one Chicago record this time
/*
city rank
Chicago 1
San Francisco 2
New York City 3
Boston 4
Los Angeles 5
*/
In pandas, you can use concat()
in conjunction with
drop_duplicates()
.
In [6]: pd.concat([df1, df2]).drop_duplicates()
Out[6]:
city rank
0 Chicago 1
1 San Francisco 2
2 New York City 3
1 Boston 4
2 Los Angeles 5