3 Panel
Panel is a somewhat less-used, but still important container for 3-dimensional data. The term panel data is derived from econometrics and is partially responsible for the name pandas: pan(el)-da(ta)-s. The names for the 3 axes are intended to give some semantic meaning to describing operations involving panel data and, in particular, econometric analysis of panel data. However, for the strict purposes of slicing and dicing a collection of DataFrame objects, you may find the axis names slightly arbitrary:
- items: axis 0, each item corresponds to a DataFrame contained inside
- major_axis: axis 1, it is the index (rows) of each of the DataFrames
- minor_axis: axis 2, it is the columns of each of the DataFrames
Construction of Panels works about like you would expect:
3.1 From 3D ndarray with optional axis labels
In [1]: wp = pd.Panel(np.random.randn(2, 5, 4), items=['Item1', 'Item2'],
...: major_axis=pd.date_range('1/1/2000', periods=5),
...: minor_axis=['A', 'B', 'C', 'D'])
...:
In [2]: wp
Out[2]:
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 5 (major_axis) x 4 (minor_axis)
Items axis: Item1 to Item2
Major_axis axis: 2000-01-01 00:00:00 to 2000-01-05 00:00:00
Minor_axis axis: A to D
3.2 From dict of DataFrame objects
In [3]: data = {'Item1' : pd.DataFrame(np.random.randn(4, 3)),
...: 'Item2' : pd.DataFrame(np.random.randn(4, 2))}
...:
In [4]: pd.Panel(data)
Out[4]:
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 4 (major_axis) x 3 (minor_axis)
Items axis: Item1 to Item2
Major_axis axis: 0 to 3
Minor_axis axis: 0 to 2
Note that the values in the dict need only be convertible to DataFrame. Thus, they can be any of the other valid inputs to DataFrame as per above.
One helpful factory method is Panel.from_dict
, which takes a
dictionary of DataFrames as above, and the following named parameters:
Parameter | Default | Description |
---|---|---|
intersect | False |
drops elements whose indices do not align |
orient | items |
use minor to use DataFrames’ columns as panel items |
For example, compare to the construction above:
In [5]: pd.Panel.from_dict(data, orient='minor')
Out[5]:
<class 'pandas.core.panel.Panel'>
Dimensions: 3 (items) x 4 (major_axis) x 2 (minor_axis)
Items axis: 0 to 2
Major_axis axis: 0 to 3
Minor_axis axis: Item1 to Item2
Orient is especially useful for mixed-type DataFrames. If you pass a dict of
DataFrame objects with mixed-type columns, all of the data will get upcasted to
dtype=object
unless you pass orient='minor'
:
In [6]: df = pd.DataFrame({'a': ['foo', 'bar', 'baz'],
...: 'b': np.random.randn(3)})
...:
In [7]: df
Out[7]:
a b
0 foo 0.0623
1 bar -0.1104
2 baz -1.1844
In [8]: data = {'item1': df, 'item2': df}
In [9]: panel = pd.Panel.from_dict(data, orient='minor')
In [10]: panel['a']
Out[10]:
item1 item2
0 foo foo
1 bar bar
2 baz baz
In [11]: panel['b']
Out[11]:
item1 item2
0 0.0623 0.0623
1 -0.1104 -0.1104
2 -1.1844 -1.1844
In [12]: panel['b'].dtypes
Out[12]:
item1 float64
item2 float64
dtype: object
Note
Unfortunately Panel, being less commonly used than Series and DataFrame, has been slightly neglected feature-wise. A number of methods and options available in DataFrame are not available in Panel. This will get worked on, of course, in future releases. And faster if you join me in working on the codebase.
3.3 From DataFrame using to_panel
method
This method was introduced in v0.7 to replace LongPanel.to_long
, and converts
a DataFrame with a two-level index to a Panel.
In [13]: midx = pd.MultiIndex(levels=[['one', 'two'], ['x','y']], labels=[[1,1,0,0],[1,0,1,0]])
In [14]: df = pd.DataFrame({'A' : [1, 2, 3, 4], 'B': [5, 6, 7, 8]}, index=midx)
In [15]: df.to_panel()
Out[15]:
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 2 (major_axis) x 2 (minor_axis)
Items axis: A to B
Major_axis axis: one to two
Minor_axis axis: x to y
3.4 Item selection / addition / deletion
Similar to DataFrame functioning as a dict of Series, Panel is like a dict of DataFrames:
In [16]: wp['Item1']
Out[16]:
A B C D
2000-01-01 -0.0579 -0.3682 -1.1441 0.8612
2000-01-02 0.8002 0.7821 -1.0691 -1.0992
2000-01-03 0.2553 0.0097 0.6611 0.3793
2000-01-04 -0.0084 1.9525 -1.0567 0.5339
2000-01-05 -1.2270 0.0404 -0.5075 -0.2301
In [17]: wp['Item3'] = wp['Item1'] / wp['Item2']
The API for insertion and deletion is the same as for DataFrame. And as with DataFrame, if the item is a valid python identifier, you can access it as an attribute and tab-complete it in IPython.
3.5 Transposing
A Panel can be rearranged using its transpose
method (which does not make a
copy by default unless the data are heterogeneous):
In [18]: wp.transpose(2, 0, 1)
Out[18]:
<class 'pandas.core.panel.Panel'>
Dimensions: 4 (items) x 3 (major_axis) x 5 (minor_axis)
Items axis: A to D
Major_axis axis: Item1 to Item3
Minor_axis axis: 2000-01-01 00:00:00 to 2000-01-05 00:00:00
3.6 Indexing / Selection
Operation | Syntax | Result |
---|---|---|
Select item | wp[item] |
DataFrame |
Get slice at major_axis label | wp.major_xs(val) |
DataFrame |
Get slice at minor_axis label | wp.minor_xs(val) |
DataFrame |
For example, using the earlier example data, we could do:
In [19]: wp['Item1']
Out[19]:
A B C D
2000-01-01 -0.0579 -0.3682 -1.1441 0.8612
2000-01-02 0.8002 0.7821 -1.0691 -1.0992
2000-01-03 0.2553 0.0097 0.6611 0.3793
2000-01-04 -0.0084 1.9525 -1.0567 0.5339
2000-01-05 -1.2270 0.0404 -0.5075 -0.2301
In [20]: wp.major_xs(wp.major_axis[2])
Out[20]:
Item1 Item2 Item3
A 0.2553 0.6046 0.4222
B 0.0097 2.1215 0.0046
C 0.6611 0.5977 1.1060
D 0.3793 0.5637 0.6729
In [21]: wp.minor_axis
Out[21]: Index([u'A', u'B', u'C', u'D'], dtype='object')
In [22]: wp.minor_xs('C')
Out[22]:
Item1 Item2 Item3
2000-01-01 -1.1441 -1.6525 0.6923
2000-01-02 -1.0691 1.1460 -0.9329
2000-01-03 0.6611 0.5977 1.1060
2000-01-04 -1.0567 1.3750 -0.7685
2000-01-05 -0.5075 0.3780 -1.3428
3.7 Squeezing
Another way to change the dimensionality of an object is to squeeze
a 1-len object, similar to wp['Item1']
In [23]: wp.reindex(items=['Item1']).squeeze()
Out[23]:
A B C D
2000-01-01 -0.0579 -0.3682 -1.1441 0.8612
2000-01-02 0.8002 0.7821 -1.0691 -1.0992
2000-01-03 0.2553 0.0097 0.6611 0.3793
2000-01-04 -0.0084 1.9525 -1.0567 0.5339
2000-01-05 -1.2270 0.0404 -0.5075 -0.2301
In [24]: wp.reindex(items=['Item1'], minor=['B']).squeeze()
Out[24]:
2000-01-01 -0.3682
2000-01-02 0.7821
2000-01-03 0.0097
2000-01-04 1.9525
2000-01-05 0.0404
Freq: D, Name: B, dtype: float64
3.8 Conversion to DataFrame
A Panel can be represented in 2D form as a hierarchically indexed
DataFrame. See the section hierarchical indexing
for more on this. To convert a Panel to a DataFrame, use the to_frame
method:
In [25]: panel = pd.Panel(np.random.randn(3, 5, 4), items=['one', 'two', 'three'],
....: major_axis=pd.date_range('1/1/2000', periods=5),
....: minor_axis=['a', 'b', 'c', 'd'])
....:
In [26]: panel.to_frame()
Out[26]:
one two three
major minor
2000-01-01 a -0.5581 -0.2238 -1.3776
b 0.0778 1.3974 0.4993
c 0.6295 1.5039 -1.4053
d -1.0353 -0.4789 0.1626
... ... ... ...
2000-01-05 a -1.2905 -0.3902 0.2525
b 0.7879 1.2071 1.5006
c 1.5157 0.1787 1.0532
d -0.2765 -1.0042 -2.3386
[20 rows x 3 columns]