4 Missing Data
pandas primarily uses the value np.nan
to represent missing data. It is by
default not included in computations. See the Missing Data section
Reindexing allows you to change/add/delete the index on a specified axis. This returns a copy of the data.
In [1]: df1 = df.reindex(index=dates[0:4], columns=list(df.columns) + ['E'])
In [2]: df1.loc[dates[0]:dates[1],'E'] = 1
In [3]: df1
Out[3]:
A B C D F E
2013-01-01 0.000000 0.000000 -1.509059 5 NaN 1.0
2013-01-02 1.212112 -0.173215 0.119209 5 1.0 1.0
2013-01-03 -0.861849 -2.104569 -0.494929 5 2.0 NaN
2013-01-04 0.721555 -0.706771 -1.039575 5 3.0 NaN
To drop any rows that have missing data.
In [4]: df1.dropna(how='any')
Out[4]:
A B C D F E
2013-01-02 1.212112 -0.173215 0.119209 5 1.0 1.0
Filling missing data
In [5]: df1.fillna(value=5)
Out[5]:
A B C D F E
2013-01-01 0.000000 0.000000 -1.509059 5 5.0 1.0
2013-01-02 1.212112 -0.173215 0.119209 5 1.0 1.0
2013-01-03 -0.861849 -2.104569 -0.494929 5 2.0 5.0
2013-01-04 0.721555 -0.706771 -1.039575 5 3.0 5.0
To get the boolean mask where values are nan
In [6]: pd.isnull(df1)
Out[6]:
A B C D F E
2013-01-01 False False False False True False
2013-01-02 False False False False False False
2013-01-03 False False False False False True
2013-01-04 False False False False False True