5.9 Getting Data In/Out

New in version 0.15.2.

Writing data (Series, Frames) to a HDF store that contains a category dtype was implemented in 0.15.2. See here for an example and caveats.

Writing data to and reading data from Stata format files was implemented in 0.15.2. See here for an example and caveats.

Writing to a CSV file will convert the data, effectively removing any information about the categorical (categories and ordering). So if you read back the CSV file you have to convert the relevant columns back to category and assign the right categories and categories ordering.

In [1]: s = pd.Series(pd.Categorical(['a', 'b', 'b', 'a', 'a', 'd']))

# rename the categories
In [2]: s.cat.categories = ["very good", "good", "bad"]

# reorder the categories and add missing categories
In [3]: s = s.cat.set_categories(["very bad", "bad", "medium", "good", "very good"])

In [4]: df = pd.DataFrame({"cats":s, "vals":[1,2,3,4,5,6]})

In [5]: csv = StringIO()

In [6]: df.to_csv(csv)

In [7]: df2 = pd.read_csv(StringIO(csv.getvalue()))

In [8]: df2.dtypes
Out[8]: 
Unnamed: 0     int64
cats          object
vals           int64
dtype: object

In [9]: df2["cats"]
Out[9]: 
0    very good
1         good
2         good
3    very good
4    very good
5          bad
Name: cats, dtype: object

# Redo the category
In [10]: df2["cats"] = df2["cats"].astype("category")

In [11]: df2["cats"].cat.set_categories(["very bad", "bad", "medium", "good", "very good"],
   ....:                                inplace=True)
   ....: 

In [12]: df2.dtypes
Out[12]: 
Unnamed: 0       int64
cats          category
vals             int64
dtype: object

In [13]: df2["cats"]
Out[13]: 
0    very good
1         good
2         good
3    very good
4    very good
5          bad
Name: cats, dtype: category
Categories (5, object): [very bad, bad, medium, good, very good]

The same holds for writing to a SQL database with to_sql.