8 Iteration

The behavior of basic iteration over pandas objects depends on the type. When iterating over a Series, it is regarded as array-like, and basic iteration produces the values. Other data structures, like DataFrame and Panel, follow the dict-like convention of iterating over the “keys” of the objects.

In short, basic iteration (for i in object) produces:

  • Series: values
  • DataFrame: column labels
  • Panel: item labels

Thus, for example, iterating over a DataFrame gives you the column names:

In [1]: df = pd.DataFrame({'col1' : np.random.randn(3), 'col2' : np.random.randn(3)},
   ...:                   index=['a', 'b', 'c'])
   ...: 

In [2]: for col in df:
   ...:     print(col)
   ...: 
col1
col2

Pandas objects also have the dict-like iteritems() method to iterate over the (key, value) pairs.

To iterate over the rows of a DataFrame, you can use the following methods:

  • iterrows(): Iterate over the rows of a DataFrame as (index, Series) pairs. This converts the rows to Series objects, which can change the dtypes and has some performance implications.
  • itertuples(): Iterate over the rows of a DataFrame as namedtuples of the values. This is a lot faster than iterrows(), and is in most cases preferable to use to iterate over the values of a DataFrame.

Warning

Iterating through pandas objects is generally slow. In many cases, iterating manually over the rows is not needed and can be avoided with one of the following approaches:

  • Look for a vectorized solution: many operations can be performed using built-in methods or numpy functions, (boolean) indexing, ...
  • When you have a function that cannot work on the full DataFrame/Series at once, it is better to use apply() instead of iterating over the values. See the docs on function application.
  • If you need to do iterative manipulations on the values but performance is important, consider writing the inner loop using e.g. cython or numba. See the enhancing performance section for some examples of this approach.

Warning

You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect!

For example, in the following case setting the value has no effect:

In [3]: df = pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']})

In [4]: for index, row in df.iterrows():
   ...:     row['a'] = 10
   ...: 

In [5]: df
Out[5]: 
   a  b
0  1  a
1  2  b
2  3  c

8.1 iteritems

Consistent with the dict-like interface, iteritems() iterates through key-value pairs:

  • Series: (index, scalar value) pairs
  • DataFrame: (column, Series) pairs
  • Panel: (item, DataFrame) pairs

For example:

In [6]: for item, frame in wp.iteritems():
   ...:     print(item)
   ...:     print(frame)
   ...: 
Item1
                   A         B         C         D
2000-01-01 -0.514968 -0.922744  1.719198  0.354214
2000-01-02 -0.964852  1.149227  0.085127 -0.666126
2000-01-03 -0.937352 -0.236178 -0.065276  0.966529
2000-01-04  0.275865  0.952374  0.453077  0.105015
2000-01-05 -1.080907  2.059111 -0.569357  0.227393
Item2
                   A         B         C         D
2000-01-01 -0.678641  0.754388 -0.863078  0.450325
2000-01-02  0.074156 -0.070482  0.065135 -0.353930
2000-01-03  1.450099 -0.388589 -0.291465 -0.273057
2000-01-04 -1.777603 -0.383081  0.868747  0.498215
2000-01-05 -0.703629  1.366700  0.140995 -2.331324

8.2 iterrows

iterrows() allows you to iterate through the rows of a DataFrame as Series objects. It returns an iterator yielding each index value along with a Series containing the data in each row:

In [7]: for row_index, row in df.iterrows():
   ...:     print('%s\n%s' % (row_index, row))
   ...: 
0
a    1
b    a
Name: 0, dtype: object
1
a    2
b    b
Name: 1, dtype: object
2
a    3
b    c
Name: 2, dtype: object

Note

Because iterrows() returns a Series for each row, it does not preserve dtypes across the rows (dtypes are preserved across columns for DataFrames). For example,

In [8]: df_orig = pd.DataFrame([[1, 1.5]], columns=['int', 'float'])

In [9]: df_orig.dtypes
Out[9]: 
int        int64
float    float64
dtype: object

In [10]: row = next(df_orig.iterrows())[1]

In [11]: row
Out[11]: 
int      1.0
float    1.5
Name: 0, dtype: float64

All values in row, returned as a Series, are now upcasted to floats, also the original integer value in column x:

In [12]: row['int'].dtype
Out[12]: dtype('float64')

In [13]: df_orig['int'].dtype
Out[13]: dtype('int64')

To preserve dtypes while iterating over the rows, it is better to use itertuples() which returns namedtuples of the values and which is generally much faster as iterrows.

For instance, a contrived way to transpose the DataFrame would be:

In [14]: df2 = pd.DataFrame({'x': [1, 2, 3], 'y': [4, 5, 6]})

In [15]: print(df2)
   x  y
0  1  4
1  2  5
2  3  6

In [16]: print(df2.T)
   0  1  2
x  1  2  3
y  4  5  6

In [17]: df2_t = pd.DataFrame(dict((idx,values) for idx, values in df2.iterrows()))

In [18]: print(df2_t)
   0  1  2
x  1  2  3
y  4  5  6

8.3 itertuples

The itertuples() method will return an iterator yielding a namedtuple for each row in the DataFrame. The first element of the tuple will be the row’s corresponding index value, while the remaining values are the row values.

For instance,

In [19]: for row in df.itertuples():
   ....:     print(row)
   ....: 
Pandas(Index=0, a=1, b='a')
Pandas(Index=1, a=2, b='b')
Pandas(Index=2, a=3, b='c')

This method does not convert the row to a Series object but just returns the values inside a namedtuple. Therefore, itertuples() preserves the data type of the values and is generally faster as iterrows().

Note

The column names will be renamed to positional names if they are invalid Python identifiers, repeated, or start with an underscore. With a large number of columns (>255), regular tuples are returned.