'How to convert list of bytes (unicode) to Python string?
I have a list of bytes (8 bit bytes, or in C/C++ language they form wchar_t type string), they form an UNICODE string (byte by byte), how to convert those values into a Python string, tried a few things, but none could join those 2 bytes into 1 character and build an entire string from it. Thank you.
Solution 1:[1]
Converting a sequence of bytes to a Unicode string is done by calling the decode()
method on that str
(in Python 2.x) or bytes
(Python 3.x) object.
If you actually have a list of bytes, then, to get this object, you can use ''.join(bytelist)
or b''.join(bytelist)
.
You need to specify the encoding that was used to encode the original Unicode string.
However, the term "Python string" is a bit ambiguous and also version-dependent. The Python str
type stands for a byte string in Python 2.x and a Unicode string in Python 3.x. So, in Python 2, just doing ''.join(bytelist)
will give you a str
object.
Demo for Python 2:
In [1]: '????'
Out[1]: '\xd1\x82\xd0\xb5\xd1\x81\xd1\x82'
In [2]: bytelist = ['\xd1', '\x82', '\xd0', '\xb5', '\xd1', '\x81', '\xd1', '\x82']
In [3]: ''.join(bytelist).decode('utf-8')
Out[3]: u'\u0442\u0435\u0441\u0442'
In [4]: print ''.join(bytelist).decode('utf-8') # encodes to the terminal encoding
????
In [5]: ''.join(bytelist) == '????'
Out[5]: True
Solution 2:[2]
you can also convert the byte list into string list using the decode()
stringlist=[x.decode('utf-8') for x in bytelist]
Solution 3:[3]
Here's what worked the best for me:
import codecs
print(type(byteData)) # <class 'bytes'>
strData = codecs.decode(byteData, 'UTF-8')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Umer |
Solution 3 | Hrvoje |