10.8 msgpack (experimental)

New in version 0.13.0.

Starting in 0.13.0, pandas is supporting the msgpack format for object serialization. This is a lightweight portable binary format, similar to binary JSON, that is highly space efficient, and provides good performance both on the writing (serialization), and reading (deserialization).

Warning

This is a very new feature of pandas. We intend to provide certain optimizations in the io of the msgpack data. Since this is marked as an EXPERIMENTAL LIBRARY, the storage format may not be stable until a future release.

As a result of writing format changes and other issues:

Packed with	Can be unpacked with
pre-0.17 / Python 2	any
pre-0.17 / Python 3	any
0.17 / Python 2	0.17 / Python 2 >=0.18 / any Python
0.17 / Python 3	>=0.18 / any Python
0.18	>= 0.18

Reading (files packed by older versions) is backward-compatibile, except for files packed with 0.17 in Python 2, in which case only they can only be unpacked in Python 2.

In [1]: df = pd.DataFrame(np.random.rand(5,2),columns=list('AB'))

In [2]: df.to_msgpack('foo.msg')

In [3]: pd.read_msgpack('foo.msg')
Out[3]: 
        A       B
0  0.1270  0.9667
1  0.2605  0.8972
2  0.3767  0.3362
3  0.4514  0.8403
4  0.1231  0.5430

In [4]: s = pd.Series(np.random.rand(5),index=pd.date_range('20130101',periods=5))

You can pass a list of objects and you will receive them back on deserialization.

In [5]: pd.to_msgpack('foo.msg', df, 'foo', np.array([1,2,3]), s)

In [6]: pd.read_msgpack('foo.msg')
Out[6]: 
[        A       B
 0  0.1270  0.9667
 1  0.2605  0.8972
 2  0.3767  0.3362
 3  0.4514  0.8403
 4  0.1231  0.5430, 'foo', array([1, 2, 3]), 2013-01-01    0.3730
 2013-01-02    0.4480
 2013-01-03    0.1294
 2013-01-04    0.8599
 2013-01-05    0.8204
 Freq: D, dtype: float64]

You can pass iterator=True to iterate over the unpacked results

In [7]: for o in pd.read_msgpack('foo.msg',iterator=True):
   ...:     print o
   ...: 
        A       B
0  0.1270  0.9667
1  0.2605  0.8972
2  0.3767  0.3362
3  0.4514  0.8403
4  0.1231  0.5430
foo
[1 2 3]
2013-01-01    0.3730
2013-01-02    0.4480
2013-01-03    0.1294
2013-01-04    0.8599
2013-01-05    0.8204
Freq: D, dtype: float64

You can pass append=True to the writer to append to an existing pack

In [8]: df.to_msgpack('foo.msg',append=True)

In [9]: pd.read_msgpack('foo.msg')
Out[9]: 
[        A       B
 0  0.1270  0.9667
 1  0.2605  0.8972
 2  0.3767  0.3362
 3  0.4514  0.8403
 4  0.1231  0.5430, 'foo', array([1, 2, 3]), 2013-01-01    0.3730
 2013-01-02    0.4480
 2013-01-03    0.1294
 2013-01-04    0.8599
 2013-01-05    0.8204
 Freq: D, dtype: float64,         A       B
 0  0.1270  0.9667
 1  0.2605  0.8972
 2  0.3767  0.3362
 3  0.4514  0.8403
 4  0.1231  0.5430]

Unlike other io methods, to_msgpack is available on both a per-object basis, df.to_msgpack() and using the top-level pd.to_msgpack(...) where you can pack arbitrary collections of python lists, dicts, scalars, while intermixing pandas objects.

In [10]: pd.to_msgpack('foo2.msg', { 'dict' : [ { 'df' : df }, { 'string' : 'foo' }, { 'scalar' : 1. }, { 's' : s } ] })

In [11]: pd.read_msgpack('foo2.msg')
Out[11]: 
{'dict': ({'df':         A       B
   0  0.1270  0.9667
   1  0.2605  0.8972
   2  0.3767  0.3362
   3  0.4514  0.8403
   4  0.1231  0.5430},
  {'string': 'foo'},
  {'scalar': 1.0},
  {'s': 2013-01-01    0.3730
   2013-01-02    0.4480
   2013-01-03    0.1294
   2013-01-04    0.8599
   2013-01-05    0.8204
   Freq: D, dtype: float64})}

10.8.1 Read/Write API

Msgpacks can also be read from and written to strings.

In [12]: df.to_msgpack()
Out[12]: '\x84\xa6blocks\x91\x86\xa5dtype\xa7float64\xa8compress\xc0\xa4locs\x86\xa4ndim\x01\xa5dtype\xa5int64\xa8compress\xc0\xa4data\xd8\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\xa5shape\x91\x02\xa3typ\xa7ndarray\xa5shape\x92\x02\x05\xa6values\xc7P\x00\xac\x15=(\x8c@\xc0?\x04\x85\xa5\x8d\xa3\xab\xd0?\x80\xf6s\xd7\xaa\x1c\xd8?N\x0c\xb8"Z\xe3\xdc?@]\xc9C\x9f\x83\xbf?0\xae\x97?Z\xef\xee?\x88\x85\x1d_)\xb6\xec?\x14\xb7\x075\xa8\x84\xd5?\xf4\xf8\xdc\xa0^\xe3\xea?D8U|x`\xe1?\xa5klass\xaaFloatBlock\xa4axes\x92\x86\xa4name\xc0\xa5dtype\xa6object\xa8compress\xc0\xa4data\x92\xc4\x01A\xc4\x01B\xa5klass\xa5Index\xa3typ\xa5index\x86\xa4name\xc0\xa4stop\x05\xa5start\x00\xa4step\x01\xa5klass\xaaRangeIndex\xa3typ\xabrange_index\xa3typ\xadblock_manager\xa5klass\xa9DataFrame'

Furthermore you can concatenate the strings to produce a list of the original objects.

In [13]: pd.read_msgpack(df.to_msgpack() + s.to_msgpack())
Out[13]: 
[        A       B
 0  0.1270  0.9667
 1  0.2605  0.8972
 2  0.3767  0.3362
 3  0.4514  0.8403
 4  0.1231  0.5430, 2013-01-01    0.3730
 2013-01-02    0.4480
 2013-01-03    0.1294
 2013-01-04    0.8599
 2013-01-05    0.8204
 Freq: D, dtype: float64]