4 Working with missing data

In this section, we will discuss missing (also referred to as NA) values in pandas.

Note

The choice of using NaN internally to denote missing data was largely for simplicity and performance reasons. It differs from the MaskedArray approach of, for example, scikits.timeseries. We are hopeful that NumPy will soon be able to provide a native NA type solution (similar to R) performant enough to be used in pandas.

See the cookbook for some advanced strategies

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: pd.options.display.max_rows=8

In [4]: import matplotlib

In [5]: matplotlib.style.use('ggplot')

In [6]: import matplotlib.pyplot as plt