'Python string formatting: is '%' more efficient than 'format' function?
I wanted to compare different to build a string in Python from different variables:
- using
+
to concatenate (referred to as 'plus') - using
%
- using
"".join(list)
- using
format
function - using
"{0.<attribute>}".format(object)
I compared for 3 types of scenari
- string with 2 variables
- string with 4 variables
- string with 4 variables, each used twice
I measured 1 million operations of each time and performed an average over 6 measures. I came up with the following timings:
test_plus: 0.29480
test_percent: 0.47540
test_join: 0.56240
test_format: 0.72760
test_formatC: 0.90000
test_plus_long: 0.50520
test_percent_long: 0.58660
test_join_long: 0.64540
test_format_long: 1.03400
test_formatC_long: 1.28020
test_plus_long2: 0.95220
test_percent_long2: 0.81580
test_join_long2: 0.88400
test_format_long2: 1.51500
test_formatC_long2: 1.97160
In each scenario, I came up with the following conclusion
- Concatenation seems to be one of the fastest method
- Formatting using
%
is much faster than formatting withformat
function
I believe format
is much better than %
(e.g. in this question) and %
was almost deprecated.
I have therefore several questions:
- Is
%
really faster thanformat
? - If so, why is that?
- Why is
"{} {}".format(var1, var2)
more efficient than"{0.attribute1} {0.attribute2}".format(object)
?
For reference, I used the following code to measure the different timings.
import time
def timing(f, n, show, *args):
if show: print f.__name__ + ":\t",
r = range(n/10)
t1 = time.clock()
for i in r:
f(*args); f(*args); f(*args); f(*args); f(*args); f(*args); f(*args); f(*args); f(*args); f(*args)
t2 = time.clock()
timing = round(t2-t1, 3)
if show: print timing
return timing
class values(object):
def __init__(self, a, b, c="", d=""):
self.a = a
self.b = b
self.c = c
self.d = d
def test_plus(a, b):
return a + "-" + b
def test_percent(a, b):
return "%s-%s" % (a, b)
def test_join(a, b):
return ''.join([a, '-', b])
def test_format(a, b):
return "{}-{}".format(a, b)
def test_formatC(val):
return "{0.a}-{0.b}".format(val)
def test_plus_long(a, b, c, d):
return a + "-" + b + "-" + c + "-" + d
def test_percent_long(a, b, c, d):
return "%s-%s-%s-%s" % (a, b, c, d)
def test_join_long(a, b, c, d):
return ''.join([a, '-', b, '-', c, '-', d])
def test_format_long(a, b, c, d):
return "{0}-{1}-{2}-{3}".format(a, b, c, d)
def test_formatC_long(val):
return "{0.a}-{0.b}-{0.c}-{0.d}".format(val)
def test_plus_long2(a, b, c, d):
return a + "-" + b + "-" + c + "-" + d + "-" + a + "-" + b + "-" + c + "-" + d
def test_percent_long2(a, b, c, d):
return "%s-%s-%s-%s-%s-%s-%s-%s" % (a, b, c, d, a, b, c, d)
def test_join_long2(a, b, c, d):
return ''.join([a, '-', b, '-', c, '-', d, '-', a, '-', b, '-', c, '-', d])
def test_format_long2(a, b, c, d):
return "{0}-{1}-{2}-{3}-{0}-{1}-{2}-{3}".format(a, b, c, d)
def test_formatC_long2(val):
return "{0.a}-{0.b}-{0.c}-{0.d}-{0.a}-{0.b}-{0.c}-{0.d}".format(val)
def test_plus_superlong(lst):
string = ""
for i in lst:
string += str(i)
return string
def test_join_superlong(lst):
return "".join([str(i) for i in lst])
def mean(numbers):
return float(sum(numbers)) / max(len(numbers), 1)
nb_times = int(1e6)
n = xrange(5)
lst_numbers = xrange(1000)
from collections import defaultdict
metrics = defaultdict(list)
list_functions = [
test_plus, test_percent, test_join, test_format, test_formatC,
test_plus_long, test_percent_long, test_join_long, test_format_long, test_formatC_long,
test_plus_long2, test_percent_long2, test_join_long2, test_format_long2, test_formatC_long2,
# test_plus_superlong, test_join_superlong,
]
val = values("123", "456", "789", "0ab")
for i in n:
for f in list_functions:
print ".",
name = f.__name__
if "formatC" in name:
t = timing(f, nb_times, False, val)
elif '_long' in name:
t = timing(f, nb_times, False, "123", "456", "789", "0ab")
elif '_superlong' in name:
t = timing(f, nb_times, False, lst_numbers)
else:
t = timing(f, nb_times, False, "123", "456")
metrics[name].append(t)
# Get Average
print "\n===AVERAGE OF TIMINGS==="
for f in list_functions:
name = f.__name__
timings = metrics[name]
print "{:>20}:\t{:0.5f}".format(name, mean(timings))
Solution 1:[1]
- Yes,
%
string formatting is faster than the.format
method - most likely (this may have a much better explanation) due to
%
being a syntactical notation (hence fast execution), whereas.format
involves at least one extra method call - because attribute value access also involves an extra method call, viz.
__getattr__
I ran a slightly better analysis (on Python 3.8.2) using timeit
of various formatting methods, results of which are as follows (pretty-printed with BeautifulTable) -
Type \ num_vars | 1 | 2 | 5 | 10 | 50 | 250 |
---|---|---|---|---|---|---|
f_str_str | 0.056 | 0.063 | 0.115 | 0.173 | 0.754 | 3.717 |
f_str_int | 0.055 | 0.148 | 0.354 | 0.656 | 3.186 | 15.747 |
concat_str | 0.012 | 0.044 | 0.169 | 0.333 | 1.888 | 10.231 |
pct_s_str | 0.091 | 0.114 | 0.182 | 0.313 | 1.213 | 6.019 |
pct_s_int | 0.090 | 0.141 | 0.248 | 0.479 | 2.179 | 10.768 |
dot_format_str | 0.143 | 0.157 | 0.251 | 0.461 | 1.745 | 8.259 |
dot_format_int | 0.141 | 0.192 | 0.333 | 0.620 | 2.735 | 13.298 |
dot_format2_str | 0.159 | 0.195 | 0.330 | 0.634 | 3.494 | 18.975 |
dot_format2_int | 0.158 | 0.227 | 0.422 | 0.762 | 4.337 | 25.498 |
The trailing _str
& _int
represent the operation was carried out on respective value types.
Kindly note that the concat_str
result for a single variable is essentially just the string itself, so it shouldn't really be considered.
My setup for arriving at the results -
from timeit import timeit
from beautifultable import BeautifulTable # pip install beautifultable
times = {}
for num_vars in (250, 50, 10, 5, 2, 1):
f_str = "f'{" + '}{'.join([f'x{i}' for i in range(num_vars)]) + "}'"
# "f'{x0}{x1}'"
concat = '+'.join([f'x{i}' for i in range(num_vars)])
# 'x0+x1'
pct_s = '"' + '%s'*num_vars + '" % (' + ','.join([f'x{i}' for i in range(num_vars)]) + ')'
# '"%s%s" % (x0,x1)'
dot_format = '"' + '{}'*num_vars + '".format(' + ','.join([f'x{i}' for i in range(num_vars)]) + ')'
# '"{}{}".format(x0,x1)'
dot_format2 = '"{' + '}{'.join([f'{i}' for i in range(num_vars)]) + '}".format(' + ','.join([f'x{i}' for i in range(num_vars)]) + ')'
# '"{0}{1}".format(x0,x1)'
vars = ','.join([f'x{i}' for i in range(num_vars)])
vals_str = tuple(map(str, range(num_vars))) if num_vars > 1 else '0'
setup_str = f'{vars} = {vals_str}'
# "x0,x1 = ('0', '1')"
vals_int = tuple(range(num_vars)) if num_vars > 1 else 0
setup_int = f'{vars} = {vals_int}'
# 'x0,x1 = (0, 1)'
times[num_vars] = {
'f_str_str': timeit(f_str, setup_str),
'f_str_int': timeit(f_str, setup_int),
'concat_str': timeit(concat, setup_str),
# 'concat_int': timeit(concat, setup_int), # this will be summation, not concat
'pct_s_str': timeit(pct_s, setup_str),
'pct_s_int': timeit(pct_s, setup_int),
'dot_format_str': timeit(dot_format, setup_str),
'dot_format_int': timeit(dot_format, setup_int),
'dot_format2_str': timeit(dot_format2, setup_str),
'dot_format2_int': timeit(dot_format2, setup_int),
}
table = BeautifulTable()
table.column_headers = ['Type \ num_vars'] + list(map(str, times.keys()))
# Order is preserved, so I didn't worry much
for key in ('f_str_str', 'f_str_int', 'concat_str', 'pct_s_str', 'pct_s_int', 'dot_format_str', 'dot_format_int', 'dot_format2_str', 'dot_format2_int'):
table.append_row([key] + [times[num_vars][key] for num_vars in (1, 2, 5, 10, 50, 250)])
print(table)
I couldn't go beyond num_vars=250
because of the max arguments (255) limit with timeit
.
tl;dr - Python string formatting performance : f-strings
are fastest and more elegant, but at times (due to some implementation restrictions & being Py3.6+ only), you might have to use other formatting options as necessary.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Cristian Ciupitu |