3 Accelerated operations
pandas has support for accelerating certain types of binary numerical and boolean operations using
the numexpr library (starting in 0.11.0) and the bottleneck libraries.
These libraries are especially useful when dealing with large data sets, and provide large
speedups. numexpr uses smart chunking, caching, and multiple cores. bottleneck is
a set of specialized cython routines that are especially fast when dealing with arrays that have
nans.
Here is a sample (using 100 column x 100,000 row DataFrames):
| Operation | 0.11.0 (ms) | Prior Version (ms) | Ratio to Prior |
|---|---|---|---|
df1 > df2 |
13.32 | 125.35 | 0.1063 |
df1 * df2 |
21.71 | 36.63 | 0.5928 |
df1 + df2 |
22.04 | 36.50 | 0.6039 |
You are highly encouraged to install both libraries. See the section Recommended Dependencies for more installation info.