10.8 msgpack (experimental)
New in version 0.13.0.
Starting in 0.13.0, pandas is supporting the msgpack
format for
object serialization. This is a lightweight portable binary format, similar
to binary JSON, that is highly space efficient, and provides good performance
both on the writing (serialization), and reading (deserialization).
Warning
This is a very new feature of pandas. We intend to provide certain
optimizations in the io of the msgpack
data. Since this is marked
as an EXPERIMENTAL LIBRARY, the storage format may not be stable until a future release.
As a result of writing format changes and other issues:
Packed with | Can be unpacked with |
---|---|
pre-0.17 / Python 2 | any |
pre-0.17 / Python 3 | any |
0.17 / Python 2 |
|
0.17 / Python 3 | >=0.18 / any Python |
0.18 | >= 0.18 |
Reading (files packed by older versions) is backward-compatibile, except for files packed with 0.17 in Python 2, in which case only they can only be unpacked in Python 2.
In [1]: df = pd.DataFrame(np.random.rand(5,2),columns=list('AB'))
In [2]: df.to_msgpack('foo.msg')
In [3]: pd.read_msgpack('foo.msg')
Out[3]:
A B
0 0.1270 0.9667
1 0.2605 0.8972
2 0.3767 0.3362
3 0.4514 0.8403
4 0.1231 0.5430
In [4]: s = pd.Series(np.random.rand(5),index=pd.date_range('20130101',periods=5))
You can pass a list of objects and you will receive them back on deserialization.
In [5]: pd.to_msgpack('foo.msg', df, 'foo', np.array([1,2,3]), s)
In [6]: pd.read_msgpack('foo.msg')
Out[6]:
[ A B
0 0.1270 0.9667
1 0.2605 0.8972
2 0.3767 0.3362
3 0.4514 0.8403
4 0.1231 0.5430, 'foo', array([1, 2, 3]), 2013-01-01 0.3730
2013-01-02 0.4480
2013-01-03 0.1294
2013-01-04 0.8599
2013-01-05 0.8204
Freq: D, dtype: float64]
You can pass iterator=True
to iterate over the unpacked results
In [7]: for o in pd.read_msgpack('foo.msg',iterator=True):
...: print o
...:
A B
0 0.1270 0.9667
1 0.2605 0.8972
2 0.3767 0.3362
3 0.4514 0.8403
4 0.1231 0.5430
foo
[1 2 3]
2013-01-01 0.3730
2013-01-02 0.4480
2013-01-03 0.1294
2013-01-04 0.8599
2013-01-05 0.8204
Freq: D, dtype: float64
You can pass append=True
to the writer to append to an existing pack
In [8]: df.to_msgpack('foo.msg',append=True)
In [9]: pd.read_msgpack('foo.msg')
Out[9]:
[ A B
0 0.1270 0.9667
1 0.2605 0.8972
2 0.3767 0.3362
3 0.4514 0.8403
4 0.1231 0.5430, 'foo', array([1, 2, 3]), 2013-01-01 0.3730
2013-01-02 0.4480
2013-01-03 0.1294
2013-01-04 0.8599
2013-01-05 0.8204
Freq: D, dtype: float64, A B
0 0.1270 0.9667
1 0.2605 0.8972
2 0.3767 0.3362
3 0.4514 0.8403
4 0.1231 0.5430]
Unlike other io methods, to_msgpack
is available on both a per-object basis,
df.to_msgpack()
and using the top-level pd.to_msgpack(...)
where you
can pack arbitrary collections of python lists, dicts, scalars, while intermixing
pandas objects.
In [10]: pd.to_msgpack('foo2.msg', { 'dict' : [ { 'df' : df }, { 'string' : 'foo' }, { 'scalar' : 1. }, { 's' : s } ] })
In [11]: pd.read_msgpack('foo2.msg')
Out[11]:
{'dict': ({'df': A B
0 0.1270 0.9667
1 0.2605 0.8972
2 0.3767 0.3362
3 0.4514 0.8403
4 0.1231 0.5430},
{'string': 'foo'},
{'scalar': 1.0},
{'s': 2013-01-01 0.3730
2013-01-02 0.4480
2013-01-03 0.1294
2013-01-04 0.8599
2013-01-05 0.8204
Freq: D, dtype: float64})}
10.8.1 Read/Write API
Msgpacks can also be read from and written to strings.
In [12]: df.to_msgpack()
Out[12]: '\x84\xa6blocks\x91\x86\xa5dtype\xa7float64\xa8compress\xc0\xa4locs\x86\xa4ndim\x01\xa5dtype\xa5int64\xa8compress\xc0\xa4data\xd8\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\xa5shape\x91\x02\xa3typ\xa7ndarray\xa5shape\x92\x02\x05\xa6values\xc7P\x00\xac\x15=(\x8c@\xc0?\x04\x85\xa5\x8d\xa3\xab\xd0?\x80\xf6s\xd7\xaa\x1c\xd8?N\x0c\xb8"Z\xe3\xdc?@]\xc9C\x9f\x83\xbf?0\xae\x97?Z\xef\xee?\x88\x85\x1d_)\xb6\xec?\x14\xb7\x075\xa8\x84\xd5?\xf4\xf8\xdc\xa0^\xe3\xea?D8U|x`\xe1?\xa5klass\xaaFloatBlock\xa4axes\x92\x86\xa4name\xc0\xa5dtype\xa6object\xa8compress\xc0\xa4data\x92\xc4\x01A\xc4\x01B\xa5klass\xa5Index\xa3typ\xa5index\x86\xa4name\xc0\xa4stop\x05\xa5start\x00\xa4step\x01\xa5klass\xaaRangeIndex\xa3typ\xabrange_index\xa3typ\xadblock_manager\xa5klass\xa9DataFrame'
Furthermore you can concatenate the strings to produce a list of the original objects.
In [13]: pd.read_msgpack(df.to_msgpack() + s.to_msgpack())
Out[13]:
[ A B
0 0.1270 0.9667
1 0.2605 0.8972
2 0.3767 0.3362
3 0.4514 0.8403
4 0.1231 0.5430, 2013-01-01 0.3730
2013-01-02 0.4480
2013-01-03 0.1294
2013-01-04 0.8599
2013-01-05 0.8204
Freq: D, dtype: float64]