1.7 Unicode Formatting

Warning

Enabling this option will affect the performance for printing of DataFrame and Series (about 2 times slower). Use only when it is actually required.

Some East Asian countries use Unicode characters its width is corresponding to 2 alphabets. If DataFrame or Series contains these characters, default output cannot be aligned properly.

Note

Screen captures are attached for each outputs to show the actual results.

df = pd.DataFrame({u'国籍': ['UK', u'日本'], u'名前': ['Alice', u'しのぶ']})
df;
../_images/option_unicode01.png

Enable display.unicode.east_asian_width allows pandas to check each character’s “East Asian Width” property. These characters can be aligned properly by checking this property, but it takes longer time than standard len function.

In [1]: pd.set_option('display.unicode.east_asian_width', True)

In [2]: df;
../_images/option_unicode02.png

In addition, Unicode contains characters which width is “Ambiguous”. These character’s width should be either 1 or 2 depending on terminal setting or encoding. Because this cannot be distinguished from Python, display.unicode.ambiguous_as_wide option is added to handle this.

By default, “Ambiguous” character’s width, “¡” (inverted exclamation) in below example, is regarded as 1.

df = pd.DataFrame({'a': ['xxx', u'¡¡'], 'b': ['yyy', u'¡¡']})
df;
../_images/option_unicode03.png

Enabling display.unicode.ambiguous_as_wide lets pandas to figure these character’s width as 2. Note that this option will be effective only when display.unicode.east_asian_width is enabled. Confirm starting position has been changed, but is not aligned properly because the setting is mismatched with this environment.

In [3]: pd.set_option('display.unicode.ambiguous_as_wide', True)

In [4]: df;
../_images/option_unicode04.png