'Fast concatenation of bytes() in python3

I have an array of byte-strings in python3 (it's an audio chunks). I want to make one big byte-string from it. Simple implementation is kind of slow. How to do it better?

chunks = []
while not audio.ends():
  chunks.append( bytes(audio.next_buffer()) )
  do_some_chunk_processing()

all_audio=b''
for ch in chunks:
  all_audio += ch

How to do it faster?



Solution 1:[1]

One approach you could try and measure would be to use bytes.join:

all_audio = b''.join(chunks)

The reason this might be faster is that this does a pre-pass over the chunks to find out how big all_audio needs to be, allocates exactly the right size once, then concatenates it in one go.

Reference

Solution 2:[2]

Use bytearray()

from time import time

bytes_arr = bytearray()
bytes_string = b''

c = b'\x02\x03\x05\x07' * 500

st = time()
for _ in range(10**4):
    bytes_string += c

print("string concat -> took {} sec".format(time()-st))

st = time()
for _ in range(10**4):
    bytes_arr.extend(c)
# convert byte_arr to bytes_string via
bytes_string = bytes(bytes_arr)

print("bytearray extend/cancat -> took {} sec".format(time()-st))

benchmark in my Win10|Corei7-7th Gen shows:

string concat -> took 67.27699875831604 sec
bytearray extend/cancat -> took 0.08975911140441895 sec

the code is pretty self-explanatory. instead of using string+=next_block, use bytearray.extend(next_block). After building bytearray you can use bytes(bytearray) to get the bytes-string.

Solution 3:[3]

One approach is to use fstring

all_audio = b''
for ch in chunks:
        all_audio = f'{all_audio}{ch}'

Which seems to be faster for small strings, according to this comparison.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Amin Pial
Solution 3 A. Bohyn