'Fast concatenation of bytes() in python3
I have an array of byte-strings in python3 (it's an audio chunks). I want to make one big byte-string from it. Simple implementation is kind of slow. How to do it better?
chunks = []
while not audio.ends():
chunks.append( bytes(audio.next_buffer()) )
do_some_chunk_processing()
all_audio=b''
for ch in chunks:
all_audio += ch
How to do it faster?
Solution 1:[1]
One approach you could try and measure would be to use bytes.join
:
all_audio = b''.join(chunks)
The reason this might be faster is that this does a pre-pass over the chunks to find out how big all_audio
needs to be, allocates exactly the right size once, then concatenates it in one go.
Solution 2:[2]
Use bytearray()
from time import time
bytes_arr = bytearray()
bytes_string = b''
c = b'\x02\x03\x05\x07' * 500
st = time()
for _ in range(10**4):
bytes_string += c
print("string concat -> took {} sec".format(time()-st))
st = time()
for _ in range(10**4):
bytes_arr.extend(c)
# convert byte_arr to bytes_string via
bytes_string = bytes(bytes_arr)
print("bytearray extend/cancat -> took {} sec".format(time()-st))
benchmark in my Win10|Corei7-7th Gen shows:
string concat -> took 67.27699875831604 sec
bytearray extend/cancat -> took 0.08975911140441895 sec
the code is pretty self-explanatory. instead of using string+=next_block
, use bytearray.extend(next_block)
. After building bytearray
you can use bytes(bytearray)
to get the bytes-string.
Solution 3:[3]
One approach is to use fstring
all_audio = b''
for ch in chunks:
all_audio = f'{all_audio}{ch}'
Which seems to be faster for small strings, according to this comparison.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Amin Pial |
Solution 3 | A. Bohyn |