'codeUnits property vs utf8.encode function in Dart
I have this little code:
void main(List<String> args) {
const data = 'amigo+/=:chesu';
var encoded = base64Encode(utf8.encode(data));
var encoded2 = base64Encode(data.codeUnits);
var decoded = utf8.decode(base64Decode(encoded));
var decoded2 = utf8.decode(base64Decode(encoded2));
print(encoded);
print(encoded2);
print(decoded);
print(decoded2);
}
The output is:
YW1pZ28rLz06Y2hlc3U=
YW1pZ28rLz06Y2hlc3U=
amigo+/=:chesu
amigo+/=:chesu
codeUnits
property gives an unmodifiable list of the UTF-16 code units, is it OK to use utf8.decode
function? or what function should be used for encoded2
?
Solution 1:[1]
It's simply not a good idea to do base64Encode(data.codeUnits)
because base64Encode
encodes bytes, and data.codeUnits
isn't necessarily bytes.
Here they are (because all the characters of the string have code points below 256, they are even ASCII.)
Using ut8.encode
before base64Encode
is good. It works for all strings.
The best way to convert from UTF-16 code units to a String
is String.fromCharCodes
.
Here you are using base64Encode(data.codeUnits)
which only works if the data
string contains only code units up to 255. So, if you assume that, then it means that decoding that can be done using either latin1.decode
or String.fromCharCodes
.
Using ascii.decode
and utf8.decode
also works if the string only contains ASCII (which it does here, but which isn't guaranteed by base64Encode
succeeding).
In short, don't do base64Encode(data.codeUnits)
. Convert the string to bytes before doing base64Encode
, then use the reverse conversion to convert bytes back to strings.
Solution 2:[2]
I tried this
print(utf8.decode('use âsmartâ symbols like â thisâ'.codeUnits));
and got this
use “smart” symbols like ‘ this’
The ” and ‘ are smart characters from iOS keyboard
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | lrn |
Solution 2 |