7.3 Reshaping by Melt
The melt()
function is useful to massage a
DataFrame into a format where one or more columns are identifier variables,
while all other columns, considered measured variables, are “unpivoted” to the
row axis, leaving just two non-identifier columns, “variable” and “value”. The
names of those columns can be customized by supplying the var_name
and
value_name
parameters.
For instance,
In [1]: cheese = pd.DataFrame({'first' : ['John', 'Mary'],
...: 'last' : ['Doe', 'Bo'],
...: 'height' : [5.5, 6.0],
...: 'weight' : [130, 150]})
...:
In [2]: cheese
Out[2]:
first height last weight
0 John 5.5 Doe 130
1 Mary 6.0 Bo 150
In [3]: pd.melt(cheese, id_vars=['first', 'last'])
Out[3]:
first last variable value
0 John Doe height 5.5
1 Mary Bo height 6.0
2 John Doe weight 130.0
3 Mary Bo weight 150.0
In [4]: pd.melt(cheese, id_vars=['first', 'last'], var_name='quantity')
Out[4]:
first last quantity value
0 John Doe height 5.5
1 Mary Bo height 6.0
2 John Doe weight 130.0
3 Mary Bo weight 150.0
Another way to transform is to use the wide_to_long
panel data convenience function.
In [5]: dft = pd.DataFrame({"A1970" : {0 : "a", 1 : "b", 2 : "c"},
...: "A1980" : {0 : "d", 1 : "e", 2 : "f"},
...: "B1970" : {0 : 2.5, 1 : 1.2, 2 : .7},
...: "B1980" : {0 : 3.2, 1 : 1.3, 2 : .1},
...: "X" : dict(zip(range(3), np.random.randn(3)))
...: })
...:
In [6]: dft["id"] = dft.index
In [7]: dft
Out[7]:
A1970 A1980 B1970 B1980 X id
0 a d 2.5 3.2 0.469112 0
1 b e 1.2 1.3 -0.282863 1
2 c f 0.7 0.1 -1.509059 2
In [8]: pd.wide_to_long(dft, ["A", "B"], i="id", j="year")
Out[8]:
X A B
id year
0 1970 0.469112 a 2.5
1 1970 -0.282863 b 1.2
2 1970 -1.509059 c 0.7
0 1980 0.469112 d 3.2
1 1980 -0.282863 e 1.3
2 1980 -1.509059 f 0.1