5 Operations
See the Basic section on Binary Ops
5.1 Stats
Operations in general exclude missing data.
Performing a descriptive statistic
In [1]: df.mean()
Out[1]:
A 0.073711
B -0.431125
C -0.687758
D -0.233103
dtype: float64
Same operation on the other axis
In [2]: df.mean(1)
Out[2]:
2013-01-01 -0.614610
2013-01-02 0.028468
2013-01-03 -0.597386
2013-01-04 -0.188233
2013-01-05 -0.167280
2013-01-06 -0.378370
Freq: D, dtype: float64
Operating with objects that have different dimensionality and need alignment. In addition, pandas automatically broadcasts along the specified dimension.
In [3]: s = pd.Series([1,3,5,np.nan,6,8], index=dates).shift(2)
In [4]: s
Out[4]:
2013-01-01 NaN
2013-01-02 NaN
2013-01-03 1.0
2013-01-04 3.0
2013-01-05 5.0
2013-01-06 NaN
Freq: D, dtype: float64
In [5]: df.sub(s, axis='index')
Out[5]:
A B C D
2013-01-01 NaN NaN NaN NaN
2013-01-02 NaN NaN NaN NaN
2013-01-03 -1.861849 -3.104569 -1.494929 0.071804
2013-01-04 -2.278445 -3.706771 -4.039575 -2.728140
2013-01-05 -5.424972 -4.432980 -4.723768 -6.087401
2013-01-06 NaN NaN NaN NaN
5.2 Apply
Applying functions to the data
In [6]: df.apply(np.cumsum)
Out[6]:
A B C D
2013-01-01 0.469112 -0.282863 -1.509059 -1.135632
2013-01-02 1.681224 -0.456078 -1.389850 -2.179868
2013-01-03 0.819375 -2.560647 -1.884779 -1.108065
2013-01-04 1.540931 -3.267418 -2.924354 -0.836205
2013-01-05 1.115958 -2.700398 -2.648122 -1.923605
2013-01-06 0.442268 -2.586750 -4.126549 -1.398618
In [7]: df.apply(lambda x: x.max() - x.min())
Out[7]:
A 2.073961
B 2.671590
C 1.785291
D 2.207436
dtype: float64
5.3 Histogramming
See more at Histogramming and Discretization
In [8]: s = pd.Series(np.random.randint(0, 7, size=100))
In [9]: s
Out[9]:
0 4
1 2
2 1
3 2
..
96 3
97 4
98 3
99 0
dtype: int64
In [10]: s.value_counts()
Out[10]:
4 29
2 16
5 12
3 12
1 12
6 11
0 8
dtype: int64
5.4 String Methods
Series is equipped with a set of string processing methods in the str attribute that make it easy to operate on each element of the array, as in the code snippet below. Note that pattern-matching in str generally uses regular expressions by default (and in some cases always uses them). See more at Vectorized String Methods.
In [11]: s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])
In [12]: s.str.lower()
Out[12]:
0 a
1 b
2 c
3 aaba
...
5 NaN
6 caba
7 dog
8 cat
dtype: object