>>> import pandas as pd
>>> pd.options.display.max_rows = 8

2 rpy2 / R interface

Warning

In v0.16.0, the pandas.rpy interface has been deprecated and will be removed in a future version. Similar functionality can be accessed through the rpy2 project. See the updating section for a guide to port your code from the pandas.rpy to rpy2 functions.

2.1 Updating your code to use rpy2 functions

In v0.16.0, the pandas.rpy module has been deprecated and users are pointed to the similar functionality in rpy2 itself (rpy2 >= 2.4).

Instead of importing import pandas.rpy.common as com, the following imports should be done to activate the pandas conversion support in rpy2:

from rpy2.robjects import pandas2ri
pandas2ri.activate()

Converting data frames back and forth between rpy2 and pandas should be largely automated (no need to convert explicitly, it will be done on the fly in most rpy2 functions).

To convert explicitly, the functions are pandas2ri.py2ri() and pandas2ri.ri2py(). So these functions can be used to replace the existing functions in pandas:

  • com.convert_to_r_dataframe(df) should be replaced with pandas2ri.py2ri(df)
  • com.convert_robj(rdf) should be replaced with pandas2ri.ri2py(rdf)

Note: these functions are for the latest version (rpy2 2.5.x) and were called pandas2ri.pandas2ri() and pandas2ri.ri2pandas() previously.

Some of the other functionality in pandas.rpy can be replaced easily as well. For example to load R data as done with the load_data function, the current method:

df_iris = com.load_data('iris')

can be replaced with:

from rpy2.robjects import r
r.data('iris')
df_iris = pandas2ri.ri2py(r[name])

The convert_to_r_matrix function can be replaced by the normal pandas2ri.py2ri to convert dataframes, with a subsequent call to R as.matrix function.

Warning

Not all conversion functions in rpy2 are working exactly the same as the current methods in pandas. If you experience problems or limitations in comparison to the ones in pandas, please report this at the issue tracker.

See also the documentation of the rpy2 project.

2.2 R interface with rpy2

If your computer has R and rpy2 (> 2.2) installed (which will be left to the reader), you will be able to leverage the below functionality. On Windows, doing this is quite an ordeal at the moment, but users on Unix-like systems should find it quite easy. rpy2 evolves in time, and is currently reaching its release 2.3, while the current interface is designed for the 2.2.x series. We recommend to use 2.2.x over other series unless you are prepared to fix parts of the code, yet the rpy2-2.3.0 introduces improvements such as a better R-Python bridge memory management layer so it might be a good idea to bite the bullet and submit patches for the few minor differences that need to be fixed.

# if installing for the first time
hg clone http://bitbucket.org/lgautier/rpy2

cd rpy2
hg pull
hg update version_2.2.x
sudo python setup.py install

Note

To use R packages with this interface, you will need to install them inside R yourself. At the moment it cannot install them for you.

Once you have done installed R and rpy2, you should be able to import pandas.rpy.common without a hitch.

2.3 Transferring R data sets into Python

The load_data function retrieves an R data set and converts it to the appropriate pandas object (most likely a DataFrame):

# had some hiccups importing pandas.rpy, due to issue with rpy2
# (same issue as the username Horta at the thread below)
# https://github.com/ContinuumIO/anaconda-issues/issues/152
# here use @mattexx's workaround
In [1]: import readline;   import rpy2.robjects

In [2]: import pandas.rpy.common as com # <- no i can load this

In [3]: infert = com.load_data('infert')

In [4]: infert.head()
Out[4]: 
  education   age  parity  induced  case  spontaneous  stratum  pooled.stratum
1    0-5yrs  26.0     6.0      1.0   1.0          2.0        1             3.0
2    0-5yrs  42.0     1.0      1.0   1.0          0.0        2             1.0
3    0-5yrs  39.0     6.0      2.0   1.0          0.0        3             4.0
4    0-5yrs  34.0     4.0      2.0   1.0          0.0        4             2.0
5   6-11yrs  35.0     3.0      1.0   1.0          1.0        5            32.0

2.4 Converting DataFrames into R objects

New in version 0.8.

Starting from pandas 0.8, there is experimental support to convert DataFrames into the equivalent R object (that is, data.frame):

In [5]: import pandas.rpy.common as com

In [6]: df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C':[7,8,9]},
   ...:                   index=["one", "two", "three"])
   ...: 

In [7]: r_dataframe = com.convert_to_r_dataframe(df)

In [8]: print(type(r_dataframe))
<class 'rpy2.robjects.vectors.DataFrame'>

In [9]: print(r_dataframe)
      A B C
one   1 4 7
two   2 5 8
three 3 6 9

The DataFrame’s index is stored as the rownames attribute of the data.frame instance.

You can also use convert_to_r_matrix to obtain a Matrix instance, but bear in mind that it will only work with homogeneously-typed DataFrames (as R matrices bear no information on the data type):

In [10]: import pandas.rpy.common as com

In [11]: r_matrix = com.convert_to_r_matrix(df)

In [12]: print(type(r_matrix))
<class 'rpy2.robjects.vectors.Matrix'>

In [13]: print(r_matrix)
      A B C
one   1 4 7
two   2 5 8
three 3 6 9

2.5 Calling R functions with pandas objects

2.6 High-level interface to R estimators