5 Operations

5.1 Stats

Operations in general exclude missing data.

Performing a descriptive statistic

In [1]: df.mean()
Out[1]: 
A    0.073711
B   -0.431125
C   -0.687758
D   -0.233103
dtype: float64

Same operation on the other axis

In [2]: df.mean(1)
Out[2]: 
2013-01-01   -0.614610
2013-01-02    0.028468
2013-01-03   -0.597386
2013-01-04   -0.188233
2013-01-05   -0.167280
2013-01-06   -0.378370
Freq: D, dtype: float64

Operating with objects that have different dimensionality and need alignment. In addition, pandas automatically broadcasts along the specified dimension.

In [3]: s = pd.Series([1,3,5,np.nan,6,8], index=dates).shift(2)

In [4]: s
Out[4]: 
2013-01-01    NaN
2013-01-02    NaN
2013-01-03    1.0
2013-01-04    3.0
2013-01-05    5.0
2013-01-06    NaN
Freq: D, dtype: float64

In [5]: df.sub(s, axis='index')
Out[5]: 
                   A         B         C         D
2013-01-01       NaN       NaN       NaN       NaN
2013-01-02       NaN       NaN       NaN       NaN
2013-01-03 -1.861849 -3.104569 -1.494929  0.071804
2013-01-04 -2.278445 -3.706771 -4.039575 -2.728140
2013-01-05 -5.424972 -4.432980 -4.723768 -6.087401
2013-01-06       NaN       NaN       NaN       NaN

5.2 Apply

Applying functions to the data

In [6]: df.apply(np.cumsum)
Out[6]: 
                   A         B         C         D
2013-01-01  0.469112 -0.282863 -1.509059 -1.135632
2013-01-02  1.681224 -0.456078 -1.389850 -2.179868
2013-01-03  0.819375 -2.560647 -1.884779 -1.108065
2013-01-04  1.540931 -3.267418 -2.924354 -0.836205
2013-01-05  1.115958 -2.700398 -2.648122 -1.923605
2013-01-06  0.442268 -2.586750 -4.126549 -1.398618

In [7]: df.apply(lambda x: x.max() - x.min())
Out[7]: 
A    2.073961
B    2.671590
C    1.785291
D    2.207436
dtype: float64

5.3 Histogramming

See more at Histogramming and Discretization

In [8]: s = pd.Series(np.random.randint(0, 7, size=100))

In [9]: s
Out[9]: 
0     4
1     2
2     1
3     2
     ..
96    3
97    4
98    3
99    0
dtype: int64

In [10]: s.value_counts()
Out[10]: 
4    29
2    16
5    12
3    12
1    12
6    11
0     8
dtype: int64

5.4 String Methods

Series is equipped with a set of string processing methods in the str attribute that make it easy to operate on each element of the array, as in the code snippet below. Note that pattern-matching in str generally uses regular expressions by default (and in some cases always uses them). See more at Vectorized String Methods.

In [11]: s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])

In [12]: s.str.lower()
Out[12]: 
0       a
1       b
2       c
3    aaba
     ... 
5     NaN
6    caba
7     dog
8     cat
dtype: object