5.9 Getting Data In/Out
New in version 0.15.2.
Writing data (Series, Frames) to a HDF store that contains a category
dtype was implemented
in 0.15.2. See here for an example and caveats.
Writing data to and reading data from Stata format files was implemented in 0.15.2. See here for an example and caveats.
Writing to a CSV file will convert the data, effectively removing any information about the categorical (categories and ordering). So if you read back the CSV file you have to convert the relevant columns back to category and assign the right categories and categories ordering.
In [1]: s = pd.Series(pd.Categorical(['a', 'b', 'b', 'a', 'a', 'd']))
# rename the categories
In [2]: s.cat.categories = ["very good", "good", "bad"]
# reorder the categories and add missing categories
In [3]: s = s.cat.set_categories(["very bad", "bad", "medium", "good", "very good"])
In [4]: df = pd.DataFrame({"cats":s, "vals":[1,2,3,4,5,6]})
In [5]: csv = StringIO()
In [6]: df.to_csv(csv)
In [7]: df2 = pd.read_csv(StringIO(csv.getvalue()))
In [8]: df2.dtypes
Out[8]:
Unnamed: 0 int64
cats object
vals int64
dtype: object
In [9]: df2["cats"]
Out[9]:
0 very good
1 good
2 good
3 very good
4 very good
5 bad
Name: cats, dtype: object
# Redo the category
In [10]: df2["cats"] = df2["cats"].astype("category")
In [11]: df2["cats"].cat.set_categories(["very bad", "bad", "medium", "good", "very good"],
....: inplace=True)
....:
In [12]: df2.dtypes
Out[12]:
Unnamed: 0 int64
cats category
vals int64
dtype: object
In [13]: df2["cats"]
Out[13]:
0 very good
1 good
2 good
3 very good
4 very good
5 bad
Name: cats, dtype: category
Categories (5, object): [very bad, bad, medium, good, very good]
The same holds for writing to a SQL database with to_sql
.