'Why does comparison of bytes with str fails in Python 3?
In Python 3 this expression evaluates as False
:
b"" == ""
While in Python 2 this comparison is True
:
u"" == ""
Checking for identity with is
obviously fails in both cases.
But why would they implement such a behaviour in Python 3?
Solution 1:[1]
In Python 2.x, the design goal for Unicode is to enable transparent operations between Unicode and byte strings by implicitly converting between the two types.
When you do the comparison u"" == ""
, the
Unicode LHS is automatically encoded into a byte string first, and then compared to the str
RHS. That's why it returned True
.
In contrast, Python 3.x, having learned from the mess of Unicode that was in Python 2, decided to make everything about Unicode vs. byte strings explicit. Thus, b"" == ""
is False
because the byte string is no longer automatically converted to Unicode for comparison.
Solution 2:[2]
In Python 3, strings are Unicode. The type used to hold text is str
and the type used to hold data is bytes
.
the
str
andbytes
types cannot be mixed, you must always explicitly convert between them. Usestr.encode()
to go fromstr
tobytes
, andbytes.decode()
to go from bytes tostr
.
Therefore, if you do b"".decode() == ""
, you'll get True
:
>>> b"".decode() == ""
True
For more information, read Text Vs. Data Instead Of Unicode Vs. 8-bi.
Solution 3:[3]
The designers decided to not assume an encoding for coercion when comparing bytes to strings, so it falls under the default behavior of Python 3.x whereby comparisons containing differing types fail.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Peter Mortensen |
Solution 2 | |
Solution 3 | Ignacio Vazquez-Abrams |