3 Accelerated operations
pandas has support for accelerating certain types of binary numerical and boolean operations using
the numexpr
library (starting in 0.11.0) and the bottleneck
libraries.
These libraries are especially useful when dealing with large data sets, and provide large
speedups. numexpr
uses smart chunking, caching, and multiple cores. bottleneck
is
a set of specialized cython routines that are especially fast when dealing with arrays that have
nans
.
Here is a sample (using 100 column x 100,000 row DataFrames
):
Operation | 0.11.0 (ms) | Prior Version (ms) | Ratio to Prior |
---|---|---|---|
df1 > df2 |
13.32 | 125.35 | 0.1063 |
df1 * df2 |
21.71 | 36.63 | 0.5928 |
df1 + df2 |
22.04 | 36.50 | 0.6039 |
You are highly encouraged to install both libraries. See the section Recommended Dependencies for more installation info.