'codeUnits property vs utf8.encode function in Dart

I have this little code:

void main(List<String> args) {
  const data = 'amigo+/=:chesu';
  var encoded = base64Encode(utf8.encode(data));
  var encoded2 = base64Encode(data.codeUnits);
  var decoded = utf8.decode(base64Decode(encoded));
  var decoded2 = utf8.decode(base64Decode(encoded2));

  print(encoded);
  print(encoded2);
  print(decoded);
  print(decoded2);
}

The output is:

YW1pZ28rLz06Y2hlc3U=
YW1pZ28rLz06Y2hlc3U=
amigo+/=:chesu
amigo+/=:chesu

codeUnits property gives an unmodifiable list of the UTF-16 code units, is it OK to use utf8.decode function? or what function should be used for encoded2?



Solution 1:[1]

It's simply not a good idea to do base64Encode(data.codeUnits) because base64Encode encodes bytes, and data.codeUnits isn't necessarily bytes. Here they are (because all the characters of the string have code points below 256, they are even ASCII.)

Using ut8.encode before base64Encode is good. It works for all strings.

The best way to convert from UTF-16 code units to a String is String.fromCharCodes.

Here you are using base64Encode(data.codeUnits) which only works if the data string contains only code units up to 255. So, if you assume that, then it means that decoding that can be done using either latin1.decode or String.fromCharCodes. Using ascii.decode and utf8.decode also works if the string only contains ASCII (which it does here, but which isn't guaranteed by base64Encode succeeding).

In short, don't do base64Encode(data.codeUnits). Convert the string to bytes before doing base64Encode, then use the reverse conversion to convert bytes back to strings.

Solution 2:[2]

I tried this

  print(utf8.decode('use âsmartâ symbols like â thisâ'.codeUnits));

and got this

use “smart” symbols like ‘ this’

The ” and ‘ are smart characters from iOS keyboard

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 lrn
Solution 2