'How to convert unicode with hex to String in dart / flutter

%u0BB5%u0BA3%u0B95%u0BCD%u0B95%u0BAE%u0BCD

Above is unicode with hex characters string Need to convert that to readable text When decoded, the above text will return வணக்கம் meaning welcome



Solution 1:[1]

If you want a hard-coded string, as noted in Special characters in Flutter and in the Dart Language Tour, you can use \u to specify Unicode code points:

var welcome = '\u0BB5\u0BA3\u0B95\u0BCD\u0B95\u0BAE\u0BCD';

If you are given a string '%u0BB5%u0BA3%u0B95%u0BCD%u0B95%u0BAE%u0BCD' and need to convert it dynamically at runtime, then you will need to:

  1. Split the string into %uXXXX components.
  2. Parse the XXXX portion as a hexadecimal integer to get the code point.
  3. Construct a String from the code points.
void main() {
  var s = '%u0BB5%u0BA3%u0B95%u0BCD%u0B95%u0BAE%u0BCD';
  var re = RegExp(r'%u(?<codePoint>[0-9A-Fa-f]{4})');
  var matches = re.allMatches(s);
  var codePoints = [
    for (var match in matches)
      int.parse(match.namedGroup('codePoint')!, radix: 16),
  ];
  var decoded = String.fromCharCodes(codePoints);
  print(decoded); // Prints: ???????
}

Edit 1

An adjusted version that can handle strings with a mixture of encoded code points and unencoded characters:

void main() {
  var s = '%u0BB5%u0BA3%u0B95%u0BCD%u0B95%u0BAE%u0BCD'
      ' hello world! '
      '%u0BB5%u0BA3%u0B95%u0BCD%u0B95%u0BAE%u0BCD';
  var re = RegExp(r'(%u(?<codePoint>[0-9A-Fa-f]{4}))|.');
  var matches = re.allMatches(s);
  var codePoints = <int>[];
  for (var match in matches) {
    var codePoint = match.namedGroup('codePoint');
    if (codePoint != null) {
      codePoints.add(int.parse(codePoint, radix: 16));
    } else {
      codePoints += match.group(0)!.runes.toList();
    }
  }
  var decoded = String.fromCharCodes(codePoints);
  print(decoded); // Prints: ??????? hello world! ???????
}

Edit 2

The versions above assumed that your input would consist only of Unicode code points encoded as %uHHHH (where H is a hexadecimal digit) and of raw ASCII characters. However, your new version of this question indicates that you actually need to handle a mixture of:

  • Unicode code points encoded as %uHHHH.
  • Raw (unencoded) ASCII characters.
  • ASCII characters encoded as a %HH.

To handle that third case:

void main() {
  var s = '%3Cp%3E%3Cb%3E%u0B87%u0BA8%u0BCD%u0BA4%u0BBF%u0BAF%u0BBE%u0BB5%u0BBF%u0BA9%u0BCD%20%u0BAA%u0BC6%u0BB0%u0BC1%u0BAE%u0BCD%u0BAA%u0BBE%u0BA9%u0BCD%u0BAE%u0BC8%u0BAF%u0BBE%u0BA9%20%u0BAE%u0B95%u0BCD%u0B95%u0BB3%u0BCD%20%u0BAA%u0BB4%u0B99%u0BCD%u0B95%u0BBE%u0BB2%u0BA4%u0BCD%u0BA4%u0BBF%u0BB2%u0BBF%u0BB0%u0BC1%u0BA8%u0BCD%u0BA4%u0BC7%20.........%20%u0BAA%u0BCB%u0BA9%u0BCD%u0BB1%u0BC1%20%u0BA4%u0BBE%u0BA9%u0BBF%u0BAF%u0B99%u0BCD%u0B95%u0BB3%u0BC8%20%u0BAE%u0BC1%u0B95%u0BCD%u0B95%u0BBF%u0BAF%20%u0B89%u0BA3%u0BB5%u0BBE%u0B95%u0BAA%u0BCD%20%u0BAA%u0BAF%u0BA9%u0BCD%u0BAA%u0B9F%u0BC1%u0BA4%u0BCD%u0BA4%u0BBF%u0BA9%u0BB0%u0BCD.%3C/b%3E%0A%3Col%20type%3D%22I%22%20style%3D%22font-weight%3Abold%3B%22%3E%0A%3Cli%3E%3Cspan%20style%3D%22font-weight%3Anormal%3B%22%3E%20%u0B85%u0BB0%u0BBF%u0B9A%u0BBF%3C/span%3E%3C/li%3E%0A%3Cli%3E%3Cspan%20style%3D%22font-weight%3Anormal%3B%22%3E%20%u0B95%u0BC7%u0BB4%u0BCD%u0BB5%u0BB0%u0B95%u0BC1%20%3C/span%3E%3C/li%3E%0A%3Cli%3E%3Cspan%20style%3D%22font-weight%3Anormal%3B%22%3E%20%u0B93%u0B9F%u0BCD%u0BB8%u0BCD%3C/span%3E%3C/li%3E%0A%3Cli%3E%3Cspan%20style%3D%22font-weight%3Anormal%3B%22%3E%20%u0BAA%u0BB0%u0BC1%u0BAA%u0BCD%u0BAA%u0BC1%3C/span%3E%3C/li%3E%3C/ol%3E%3C/p%3E';
  var re = RegExp(
    r'(%(?<asciiValue>[0-9A-Fa-f]{2}))'
    r'|(%u(?<codePoint>[0-9A-Fa-f]{4}))'
    r'|.',
  );
  var matches = re.allMatches(s);
  var codePoints = <int>[];
  for (var match in matches) {
    var codePoint = match.namedGroup('asciiValue') ?? match.namedGroup('codePoint');
    if (codePoint != null) {
      codePoints.add(int.parse(codePoint, radix: 16));
    } else {
      codePoints += match.group(0)!.runes.toList();
    }
  }
  var decoded = String.fromCharCodes(codePoints);
  print(decoded);
}

which prints:

<p><b>??????????? ??????????????? ?????? ??????????????????? ......... ?????? ?????????? ??????? ??????? ??????????????.</b>
<ol type="I" style="font-weight:bold;">
<li><span style="font-weight:normal;"> ?????</span></li>
<li><span style="font-weight:normal;"> ???????? </span></li>
<li><span style="font-weight:normal;"> ?????</span></li>
<li><span style="font-weight:normal;"> ???????</span></li></ol></p>

There are packages that can render HTML (e.g. package:flutter_html and probably various others). Otherwise I'm going to consider dealing with the HTML to be outside the scope of this answer, and that would deserve its own question anyway.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1