.. currentmodule:: pandas
.. ipython:: python
:suppress:
import os
import csv
from pandas.compat import StringIO, BytesIO
import pandas as pd
ExcelWriter = pd.ExcelWriter
import sys
reload(sys) # Reload does the trick!
sys.setdefaultencoding('UTF8')
import numpy as np
np.random.seed(123456)
randn = np.random.randn
np.set_printoptions(precision=4, suppress=True)
import matplotlib.pyplot as plt
plt.close('all')
import pandas.util.testing as tm
pd.options.display.max_rows=15
clipdf = pd.DataFrame({'A':[1,2,3],'B':[4,5,6],'C':['p','q','r']},
index=['x','y','z'])
.. _io.sql:
SQL Queries
-----------
The :mod:`pandas.io.sql` module provides a collection of query wrappers to both
facilitate data retrieval and to reduce dependency on DB-specific API. Database abstraction
is provided by SQLAlchemy if installed. In addition you will need a driver library for
your database. Examples of such drivers are `psycopg2 `__
for PostgreSQL or `pymysql `__ for MySQL.
For `SQLite `__ this is
included in Python's standard library by default.
You can find an overview of supported drivers for each SQL dialect in the
`SQLAlchemy docs `__.
.. versionadded:: 0.14.0
If SQLAlchemy is not installed, a fallback is only provided for sqlite (and
for mysql for backwards compatibility, but this is deprecated and will be
removed in a future version).
This mode requires a Python database adapter which respect the `Python
DB-API `__.
See also some :ref:`cookbook examples ` for some advanced strategies.
The key functions are:
.. autosummary::
:toctree: generated/
read_sql_table
read_sql_query
read_sql
DataFrame.to_sql
.. note::
The function :func:`~pandas.read_sql` is a convenience wrapper around
:func:`~pandas.read_sql_table` and :func:`~pandas.read_sql_query` (and for
backward compatibility) and will delegate to specific function depending on
the provided input (database table name or sql query).
Table names do not need to be quoted if they have special characters.
In the following example, we use the `SQlite `__ SQL database
engine. You can use a temporary SQLite database where data are stored in
"memory".
To connect with SQLAlchemy you use the :func:`create_engine` function to create an engine
object from database URI. You only need to create the engine once per database you are
connecting to.
For more information on :func:`create_engine` and the URI formatting, see the examples
below and the SQLAlchemy `documentation `__
.. ipython:: python
from sqlalchemy import create_engine
# Create your engine.
engine = create_engine('sqlite:///:memory:')
If you want to manage your own connections you can pass one of those instead:
.. code-block:: python
with engine.connect() as conn, conn.begin():
data = pd.read_sql_table('data', conn)
Writing DataFrames
''''''''''''''''''
Assuming the following data is in a DataFrame ``data``, we can insert it into
the database using :func:`~pandas.DataFrame.to_sql`.
+-----+------------+-------+-------+-------+
| id | Date | Col_1 | Col_2 | Col_3 |
+=====+============+=======+=======+=======+
| 26 | 2012-10-18 | X | 25.7 | True |
+-----+------------+-------+-------+-------+
| 42 | 2012-10-19 | Y | -12.4 | False |
+-----+------------+-------+-------+-------+
| 63 | 2012-10-20 | Z | 5.73 | True |
+-----+------------+-------+-------+-------+
.. ipython:: python
:suppress:
import datetime
c = ['id', 'Date', 'Col_1', 'Col_2', 'Col_3']
d = [(26, datetime.datetime(2010,10,18), 'X', 27.5, True),
(42, datetime.datetime(2010,10,19), 'Y', -12.5, False),
(63, datetime.datetime(2010,10,20), 'Z', 5.73, True)]
data = pd.DataFrame(d, columns=c)
.. ipython:: python
data.to_sql('data', engine)
With some databases, writing large DataFrames can result in errors due to
packet size limitations being exceeded. This can be avoided by setting the
``chunksize`` parameter when calling ``to_sql``. For example, the following
writes ``data`` to the database in batches of 1000 rows at a time:
.. ipython:: python
data.to_sql('data_chunked', engine, chunksize=1000)
SQL data types
++++++++++++++
:func:`~pandas.DataFrame.to_sql` will try to map your data to an appropriate
SQL data type based on the dtype of the data. When you have columns of dtype
``object``, pandas will try to infer the data type.
You can always override the default type by specifying the desired SQL type of
any of the columns by using the ``dtype`` argument. This argument needs a
dictionary mapping column names to SQLAlchemy types (or strings for the sqlite3
fallback mode).
For example, specifying to use the sqlalchemy ``String`` type instead of the
default ``Text`` type for string columns:
.. ipython:: python
from sqlalchemy.types import String
data.to_sql('data_dtype', engine, dtype={'Col_1': String})
.. note::
Due to the limited support for timedelta's in the different database
flavors, columns with type ``timedelta64`` will be written as integer
values as nanoseconds to the database and a warning will be raised.
.. note::
Columns of ``category`` dtype will be converted to the dense representation
as you would get with ``np.asarray(categorical)`` (e.g. for string categories
this gives an array of strings).
Because of this, reading the database table back in does **not** generate
a categorical.
Reading Tables
''''''''''''''
:func:`~pandas.read_sql_table` will read a database table given the
table name and optionally a subset of columns to read.
.. note::
In order to use :func:`~pandas.read_sql_table`, you **must** have the
SQLAlchemy optional dependency installed.
.. ipython:: python
pd.read_sql_table('data', engine)
You can also specify the name of the column as the DataFrame index,
and specify a subset of columns to be read.
.. ipython:: python
pd.read_sql_table('data', engine, index_col='id')
pd.read_sql_table('data', engine, columns=['Col_1', 'Col_2'])
And you can explicitly force columns to be parsed as dates:
.. ipython:: python
pd.read_sql_table('data', engine, parse_dates=['Date'])
If needed you can explicitly specify a format string, or a dict of arguments
to pass to :func:`pandas.to_datetime`:
.. code-block:: python
pd.read_sql_table('data', engine, parse_dates={'Date': '%Y-%m-%d'})
pd.read_sql_table('data', engine, parse_dates={'Date': {'format': '%Y-%m-%d %H:%M:%S'}})
You can check if a table exists using :func:`~pandas.io.sql.has_table`
Schema support
''''''''''''''
.. versionadded:: 0.15.0
Reading from and writing to different schema's is supported through the ``schema``
keyword in the :func:`~pandas.read_sql_table` and :func:`~pandas.DataFrame.to_sql`
functions. Note however that this depends on the database flavor (sqlite does not
have schema's). For example:
.. code-block:: python
df.to_sql('table', engine, schema='other_schema')
pd.read_sql_table('table', engine, schema='other_schema')
Querying
''''''''
You can query using raw SQL in the :func:`~pandas.read_sql_query` function.
In this case you must use the SQL variant appropriate for your database.
When using SQLAlchemy, you can also pass SQLAlchemy Expression language constructs,
which are database-agnostic.
.. ipython:: python
pd.read_sql_query('SELECT * FROM data', engine)
Of course, you can specify a more "complex" query.
.. ipython:: python
pd.read_sql_query("SELECT id, Col_1, Col_2 FROM data WHERE id = 42;", engine)
The :func:`~pandas.read_sql_query` function supports a ``chunksize`` argument.
Specifying this will return an iterator through chunks of the query result:
.. ipython:: python
df = pd.DataFrame(np.random.randn(20, 3), columns=list('abc'))
df.to_sql('data_chunks', engine, index=False)
.. ipython:: python
for chunk in pd.read_sql_query("SELECT * FROM data_chunks", engine, chunksize=5):
print(chunk)
You can also run a plain query without creating a dataframe with
:func:`~pandas.io.sql.execute`. This is useful for queries that don't return values,
such as INSERT. This is functionally equivalent to calling ``execute`` on the
SQLAlchemy engine or db connection object. Again, you must use the SQL syntax
variant appropriate for your database.
.. code-block:: python
from pandas.io import sql
sql.execute('SELECT * FROM table_name', engine)
sql.execute('INSERT INTO table_name VALUES(?, ?, ?)', engine, params=[('id', 1, 12.2, True)])
Engine connection examples
''''''''''''''''''''''''''
To connect with SQLAlchemy you use the :func:`create_engine` function to create an engine
object from database URI. You only need to create the engine once per database you are
connecting to.
.. code-block:: python
from sqlalchemy import create_engine
engine = create_engine('postgresql://scott:tiger@localhost:5432/mydatabase')
engine = create_engine('mysql+mysqldb://scott:tiger@localhost/foo')
engine = create_engine('oracle://scott:tiger@127.0.0.1:1521/sidname')
engine = create_engine('mssql+pyodbc://mydsn')
# sqlite:///
# where is relative:
engine = create_engine('sqlite:///foo.db')
# or absolute, starting with a slash:
engine = create_engine('sqlite:////absolute/path/to/foo.db')
For more information see the examples the SQLAlchemy `documentation `__
Advanced SQLAlchemy queries
'''''''''''''''''''''''''''
You can use SQLAlchemy constructs to describe your query.
Use :func:`sqlalchemy.text` to specify query parameters in a backend-neutral way
.. ipython:: python
import sqlalchemy as sa
pd.read_sql(sa.text('SELECT * FROM data where Col_1=:col1'), engine, params={'col1': 'X'})
If you have an SQLAlchemy description of your database you can express where conditions using SQLAlchemy expressions
.. ipython:: python
metadata = sa.MetaData()
data_table = sa.Table('data', metadata,
sa.Column('index', sa.Integer),
sa.Column('Date', sa.DateTime),
sa.Column('Col_1', sa.String),
sa.Column('Col_2', sa.Float),
sa.Column('Col_3', sa.Boolean),
)
pd.read_sql(sa.select([data_table]).where(data_table.c.Col_3 == True), engine)
You can combine SQLAlchemy expressions with parameters passed to :func:`read_sql` using :func:`sqlalchemy.bindparam`
.. ipython:: python
import datetime as dt
expr = sa.select([data_table]).where(data_table.c.Date > sa.bindparam('date'))
pd.read_sql(expr, engine, params={'date': dt.datetime(2010, 10, 18)})
Sqlite fallback
'''''''''''''''
The use of sqlite is supported without using SQLAlchemy.
This mode requires a Python database adapter which respect the `Python
DB-API `__.
You can create connections like so:
.. code-block:: python
import sqlite3
con = sqlite3.connect(':memory:')
And then issue the following queries:
.. code-block:: python
data.to_sql('data', cnx)
pd.read_sql_query("SELECT * FROM data", con)