'How can I parse a bytestring in Python 3?
Basically, I have two bytestrings in a single line like this:
b'\xe0\xa6\xb8\xe0\xa6\x96 - \xe0\xa6\xb6\xe0\xa6\x96\n'
This is a Unicode string that I'm importing from an online file using urllib
, and I want to compare the individual bytestrings so that I can replace the wrong ones. However, I can't find out any way to parse the string so that I get \xe0\xa6\xb8\xe0\xa6\x96
and \xe0\xa6\xb6\xe0\xa6\x96
in two different variables.
I tried converting it into a raw string like str(b'\xe0\xa6\xb8\xe0\xa6\x96')
and the indexing actually works, but in that case I can't revert back to the original bytestring in the first place.
Is it possible?
Solution 1:[1]
I would recommend trying something like this...
arr = b'\xe0\xa6\xb8\xe0\xa6\x96 - \xe0\xa6\xb6\xe0\xa6\x96\n'
splt = arr.decode().split(' - ')
b_arr1 = splt[0].encode()
b_arr2 = splt[1].encode()
I tried it out in the Python 3 terminal and it works fine.
Solution 2:[2]
I would do something like this:
a = b'\xe0\xa6\xb8\xe0\xa6\x96 - \xe0\xa6\xb6\xe0\xa6\x96\n'
parts = [part.strip() for part in a.decode().split('-')]
first_part = parts[0].encode()
second_part = parts[1].encode()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Peter Mortensen |
Solution 2 | Jahongir Rahmonov |