'How to convert list of bytes (unicode) to Python string?

I have a list of bytes (8 bit bytes, or in C/C++ language they form wchar_t type string), they form an UNICODE string (byte by byte), how to convert those values into a Python string, tried a few things, but none could join those 2 bytes into 1 character and build an entire string from it. Thank you.



Solution 1:[1]

Converting a sequence of bytes to a Unicode string is done by calling the decode() method on that str (in Python 2.x) or bytes (Python 3.x) object.

If you actually have a list of bytes, then, to get this object, you can use ''.join(bytelist) or b''.join(bytelist).

You need to specify the encoding that was used to encode the original Unicode string.

However, the term "Python string" is a bit ambiguous and also version-dependent. The Python str type stands for a byte string in Python 2.x and a Unicode string in Python 3.x. So, in Python 2, just doing ''.join(bytelist) will give you a str object.

Demo for Python 2:

In [1]: '????'
Out[1]: '\xd1\x82\xd0\xb5\xd1\x81\xd1\x82'

In [2]: bytelist = ['\xd1', '\x82', '\xd0', '\xb5', '\xd1', '\x81', '\xd1', '\x82']

In [3]: ''.join(bytelist).decode('utf-8')
Out[3]: u'\u0442\u0435\u0441\u0442'

In [4]: print ''.join(bytelist).decode('utf-8') # encodes to the terminal encoding
????

In [5]: ''.join(bytelist) == '????'
Out[5]: True

Solution 2:[2]

you can also convert the byte list into string list using the decode()

stringlist=[x.decode('utf-8') for x in bytelist]

Solution 3:[3]

Here's what worked the best for me:

import codecs

print(type(byteData)) # <class 'bytes'>
strData = codecs.decode(byteData, 'UTF-8')

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Umer
Solution 3 Hrvoje