2.3. Numbers and strings¶
Contents
- How do I specify hexadecimal and octal integers?
- Why does -22 // 10 return -3?
- How do I convert a string to a number?
- How do I convert a number to a string?
- How do I modify a string in place?
- How do I use strings to call functions/methods?
- Is there an equivalent to Perl’s chomp() for removing trailing newlines from strings?
- Is there a scanf() or sscanf() equivalent?
- What does ‘UnicodeError: ASCII [decoding,encoding] error: ordinal not in range(128)’ mean?
2.3.1. How do I specify hexadecimal and octal integers?¶
To specify an octal digit, precede the octal value with a zero, and then a lower or uppercase “o”. For example, to set the variable “a” to the octal value “10” (8 in decimal), type:
>>> a = 0o10
>>> a
8
Hexadecimal is just as easy. Simply precede the hexadecimal number with a zero, and then a lower or uppercase “x”. Hexadecimal digits can be specified in lower or uppercase. For example, in the Python interpreter:
>>> a = 0xa5
>>> a
165
>>> b = 0XB2
>>> b
178
2.3.2. Why does -22 // 10 return -3?¶
It’s primarily driven by the desire that i % j have the same sign as j.
If you want that, and also want:
i == (i // j) * j + (i % j)
then integer division has to return the floor. C also requires that identity to
hold, and then compilers that truncate i // j need to make i % j have
the same sign as i.
There are few real use cases for i % j when j is negative. When j
is positive, there are many, and in virtually all of them it’s more useful for
i % j to be >= 0. If the clock says 10 now, what did it say 200 hours
ago? -190 % 12 == 2 is useful; -190 % 12 == -10 is a bug waiting to
bite.
Note
On Python 2, a / b returns the same as a // b if
__future__.division is not in effect. This is also known as “classic”
division.
2.3.3. How do I convert a string to a number?¶
For integers, use the built-in int() type constructor, e.g. int('144')
== 144. Similarly, float() converts to floating-point,
e.g. float('144') == 144.0.
By default, these interpret the number as decimal, so that int('0144') ==
144 and int('0x144') raises ValueError. int(string, base) takes
the base to convert from as a second optional argument, so int('0x144', 16) ==
324. If the base is specified as 0, the number is interpreted using Python’s
rules: a leading ‘0’ indicates octal, and ‘0x’ indicates a hex number.
Do not use the built-in function eval() if all you need is to convert
strings to numbers. eval() will be significantly slower and it presents a
security risk: someone could pass you a Python expression that might have
unwanted side effects. For example, someone could pass
__import__('os').system("rm -rf $HOME") which would erase your home
directory.
eval() also has the effect of interpreting numbers as Python expressions,
so that e.g. eval('09') gives a syntax error because Python regards numbers
starting with ‘0’ as octal (base 8).
2.3.4. How do I convert a number to a string?¶
To convert, e.g., the number 144 to the string ‘144’, use the built-in type
constructor str(). If you want a hexadecimal or octal representation, use
the built-in functions hex() or oct(). For fancy formatting, see
the formatstrings section, e.g. "{:04d}".format(144) yields
'0144' and "{:.3f}".format(1/3) yields '0.333'. You may also use
the % operator on strings. See the library reference
manual for details.
2.3.5. How do I modify a string in place?¶
You can’t, because strings are immutable. If you need an object with this ability, try converting the string to a list or use the array module:
>>> import io
>>> s = "Hello, world"
>>> a = list(s)
>>> print a
['H', 'e', 'l', 'l', 'o', ',', ' ', 'w', 'o', 'r', 'l', 'd']
>>> a[7:] = list("there!")
>>> ''.join(a)
'Hello, there!'
>>> import array
>>> a = array.array('c', s)
>>> print a
array('c', 'Hello, world')
>>> a[0] = 'y'; print a
array('c', 'yello, world')
>>> a.tostring()
'yello, world'
2.3.6. How do I use strings to call functions/methods?¶
There are various techniques.
The best is to use a dictionary that maps strings to functions. The primary advantage of this technique is that the strings do not need to match the names of the functions. This is also the primary technique used to emulate a case construct:
def a(): pass def b(): pass dispatch = {'go': a, 'stop': b} # Note lack of parens for funcs dispatch[get_input()]() # Note trailing parens to call function
Use the built-in function
getattr():import foo getattr(foo, 'bar')()
Note that
getattr()works on any object, including classes, class instances, modules, and so on.This is used in several places in the standard library, like this:
class Foo: def do_foo(self): ... def do_bar(self): ... f = getattr(foo_instance, 'do_' + opname) f()
Use
locals()oreval()to resolve the function name:def myFunc(): print "hello" fname = "myFunc" f = locals()[fname] f() f = eval(fname) f()
Note: Using
eval()is slow and dangerous. If you don’t have absolute control over the contents of the string, someone could pass a string that resulted in an arbitrary function being executed.
2.3.7. Is there an equivalent to Perl’s chomp() for removing trailing newlines from strings?¶
Starting with Python 2.2, you can use S.rstrip("\r\n") to remove all
occurrences of any line terminator from the end of the string S without
removing other trailing whitespace. If the string S represents more than
one line, with several empty lines at the end, the line terminators for all the
blank lines will be removed:
>>> lines = ("line 1 \r\n"
... "\r\n"
... "\r\n")
>>> lines.rstrip("\n\r")
'line 1 '
Since this is typically only desired when reading text one line at a time, using
S.rstrip() this way works well.
For older versions of Python, there are two partial substitutes:
- If you want to remove all trailing whitespace, use the
rstrip()method of string objects. This removes all trailing whitespace, not just a single newline. - Otherwise, if there is only one line in the string
S, useS.splitlines()[0].
2.3.8. Is there a scanf() or sscanf() equivalent?¶
Not as such.
For simple input parsing, the easiest approach is usually to split the line into
whitespace-delimited words using the split() method of string objects
and then convert decimal strings to numeric values using int() or
float(). split() supports an optional “sep” parameter which is useful
if the line uses something other than whitespace as a separator.
For more complicated input parsing, regular expressions are more powerful
than C’s sscanf() and better suited for the task.
2.3.9. What does ‘UnicodeError: ASCII [decoding,encoding] error: ordinal not in range(128)’ mean?¶
This error indicates that your Python installation can handle only 7-bit ASCII strings. There are a couple ways to fix or work around the problem.
If your programs must handle data in arbitrary character set encodings, the
environment the application runs in will generally identify the encoding of the
data it is handing you. You need to convert the input to Unicode data using
that encoding. For example, a program that handles email or web input will
typically find character set encoding information in Content-Type headers. This
can then be used to properly convert input data to Unicode. Assuming the string
referred to by value is encoded as UTF-8:
value = unicode(value, "utf-8")
will return a Unicode object. If the data is not correctly encoded as UTF-8,
the above call will raise a UnicodeError exception.
If you only want strings converted to Unicode which have non-ASCII data, you can try converting them first assuming an ASCII encoding, and then generate Unicode objects if that fails:
try:
x = unicode(value, "ascii")
except UnicodeError:
value = unicode(value, "utf-8")
else:
# value was valid ASCII data
pass
It’s possible to set a default encoding in a file called sitecustomize.py
that’s part of the Python library. However, this isn’t recommended because
changing the Python-wide default encoding may cause third-party extension
modules to fail.
Note that on Windows, there is an encoding known as “mbcs”, which uses an encoding specific to your current locale. In many cases, and particularly when working with COM, this may be an appropriate default encoding to use.