2 Attributes and the raw ndarray(s)
pandas objects have a number of attributes enabling you to access the metadata
- shape: gives the axis dimensions of the object, consistent with ndarray
- Axis labels
- Series: index (only axis)
- DataFrame: index (rows) and columns
- Panel: items, major_axis, and minor_axis
Note, these attributes can be safely assigned to!
In [1]: df
Out[1]:
A B C
2000-01-01 -0.484746 -1.131400 -0.364331
2000-01-02 -0.673583 1.442111 0.358102
2000-01-03 -0.223960 1.435608 -1.411866
2000-01-04 -0.116935 -0.629593 1.110761
2000-01-05 0.049069 0.417137 -1.424476
2000-01-06 0.521685 -1.818893 1.556518
2000-01-07 -0.688259 -0.493469 -1.097952
2000-01-08 -1.796398 0.683958 -2.283766
In [2]: df[:2]
Out[2]:
A B C
2000-01-01 -0.484746 -1.131400 -0.364331
2000-01-02 -0.673583 1.442111 0.358102
In [3]: df.columns = [x.lower() for x in df.columns]
In [4]: df
Out[4]:
a b c
2000-01-01 -0.484746 -1.131400 -0.364331
2000-01-02 -0.673583 1.442111 0.358102
2000-01-03 -0.223960 1.435608 -1.411866
2000-01-04 -0.116935 -0.629593 1.110761
2000-01-05 0.049069 0.417137 -1.424476
2000-01-06 0.521685 -1.818893 1.556518
2000-01-07 -0.688259 -0.493469 -1.097952
2000-01-08 -1.796398 0.683958 -2.283766
To get the actual data inside a data structure, one need only access the values property:
In [5]: s
Out[5]:
a 0.590969
b -1.630428
c -0.715762
d 0.723097
e -0.030146
dtype: float64
In [6]: s.values
Out[6]: array([ 0.591 , -1.6304, -0.7158, 0.7231, -0.0301])
In [7]: df
Out[7]:
a b c
2000-01-01 -0.484746 -1.131400 -0.364331
2000-01-02 -0.673583 1.442111 0.358102
2000-01-03 -0.223960 1.435608 -1.411866
2000-01-04 -0.116935 -0.629593 1.110761
2000-01-05 0.049069 0.417137 -1.424476
2000-01-06 0.521685 -1.818893 1.556518
2000-01-07 -0.688259 -0.493469 -1.097952
2000-01-08 -1.796398 0.683958 -2.283766
In [8]: df.values
Out[8]:
array([[-0.4847, -1.1314, -0.3643],
[-0.6736, 1.4421, 0.3581],
[-0.224 , 1.4356, -1.4119],
[-0.1169, -0.6296, 1.1108],
[ 0.0491, 0.4171, -1.4245],
[ 0.5217, -1.8189, 1.5565],
[-0.6883, -0.4935, -1.098 ],
[-1.7964, 0.684 , -2.2838]])
In [9]: wp
Out[9]:
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 5 (major_axis) x 4 (minor_axis)
Items axis: Item1 to Item2
Major_axis axis: 2000-01-01 00:00:00 to 2000-01-05 00:00:00
Minor_axis axis: A to D
In [10]: wp.values
Out[10]:
array([[[ 0.5413, 1.4373, -0.7365, 0.3874],
[ 0.1827, 0.6782, 1.1025, 0.1911],
[ 0.2794, 1.3832, 0.8546, 0.7783],
[-0.8853, -0.3075, 1.4992, 0.7635],
[-0.6885, -0.7081, -0.9326, -1.0033]],
[[ 0.2084, 0.9023, -1.2843, -0.32 ],
[-1.4847, 0.373 , 0.5034, -1.7292],
[-0.4232, 0.7075, 2.0166, -0.1479],
[ 0.0737, 1.0667, 1.439 , -1.7467],
[ 0.3224, -1.6387, -0.2218, 0.3126]]])
If a DataFrame or Panel contains homogeneously-typed data, the ndarray can actually be modified in-place, and the changes will be reflected in the data structure. For heterogeneous data (e.g. some of the DataFrame’s columns are not all the same dtype), this will not be the case. The values attribute itself, unlike the axis labels, cannot be assigned to.
Note
When working with heterogeneous data, the dtype of the resulting ndarray will be chosen to accommodate all of the data involved. For example, if strings are involved, the result will be of object dtype. If there are only floats and integers, the resulting array will be of float dtype.