'How to reverse a string that contains complicated emojis?

Input:

Hello world👩‍🦰👩‍👩‍👦‍👦

Desired Output:

👩‍👩‍👦‍👦👩‍🦰dlrow olleH

I tried several approaches but none gave me correct answer.

This failed miserablly:

const text = 'Hello world👩‍🦰👩‍👩‍👦‍👦';

const reversed = text.split('').reverse().join('');

console.log(reversed);

This kinda works but it breaks 👩‍👩‍👦‍👦 into 4 different emojis:

const text = 'Hello world👩‍🦰👩‍👩‍👦‍👦';

const reversed = [...text].reverse().join('');

console.log(reversed);

I also tried every answer in this question but none of them works.

Is there a way to get the desired output?



Solution 1:[1]

If you're able to, use the _.split() function provided by lodash. From version 4.0 onwards, _.split() is capable of splitting unicode emojis.

Using the native .reverse().join('') to reverse the 'characters' should work just fine with emojis containing zero-width joiners

function reverse(txt) { return _.split(txt, '').reverse().join(''); }

const text = 'Hello world??????????';
console.log(reverse(text));
<script src="https://cdnjs.cloudflare.com/ajax/libs/lodash.js/4.17.20/lodash.min.js" integrity="sha512-90vH1Z83AJY9DmlWa8WkjkV79yfS2n2Oxhsi2dZbIv0nC4E6m5AbH8Nh156kkM7JePmqD6tcZsfad1ueoaovww==" crossorigin="anonymous"></script>

Solution 2:[2]

I took TKoL's idea of using the \u200d character and used it to attempt to create a smaller script.

Note: Not all compositions use a zero width joiner so it will be buggy with other composition characters.

It uses the traditional for loop because we skip some iterations in case we find combined emoticons. Within the for loop there is a while loop to check if there is a following \u200d character. As long there is one we add the next 2 characters as well and forward the for loop with 2 iterations so combined emoticons are not reversed.

To easily use it on any string I made it as a new prototype function on the string object.

String.prototype.reverse = function() {
  let textArray = [...this];
  let reverseString = "";

  for (let i = 0; i < textArray.length; i++) {
    let char = textArray[i];
    while (textArray[i + 1] === '\u200d') {
      char += textArray[i + 1] + textArray[i + 2];
      i = i + 2;
    }
    reverseString = char + reverseString;
  }
  return reverseString;
}

const text = "Hello world??????????";

console.log(text.reverse());

//Fun fact, you can chain them to double reverse :)
//console.log(text.reverse().reverse());

Solution 3:[3]

Reversing Unicode text is tricky for a lot of reasons.

First, depending on the programming language, strings are represented in different ways, either as a list of bytes, a list of UTF-16 code units (16 bits wide, often called "characters" in the API), or as ucs4 code points (4 bytes wide).

Second, different APIs reflect that inner representation to different degrees. Some work on the abstraction of bytes, some on UTF-16 characters, some on code points. When the representation uses bytes or UTF-16 characters, there are usually parts of the API that give you access to the elements of this representation, as well as parts that perform the necessary logic to get from bytes (via UTF-8) or from UTF-16 characters to the actual code points.

Often, the parts of the API performing that logic and thus giving you access to the code points have been added later, as first there was 7 bit ascii, then a bit later everybody thought 8 bits were enough, using different code pages, and even later that 16 bits were enough for unicode. The notion of code points as integer numbers without a fixed upper limit was historically added as the fourth common character length for logically encoding text.

Using an API that gives you access to the actual code points seems like that's it. But...

Third, there are a lot of modifier code points affecting the next code point or following code points. E.g. there's a diacritic modifier turning a following a into an ä, e to ë, &c. Turn the code points around, and aë becomes eä, made of different letters. There is a direct representation of e.g. ä as its own code point but using the modifier is just as valid.

Fourth, everything is in constant flux. There are also a lot of modifiers among the emoji, as used in the example, and more are added every year. Therefore, if an API gives you access to the information whether a code point is a modifier, the version of the API will determine whether it already knows a specific new modifier.

Unicode provides a hacky trick, though, for when it's only about the visual appearance:

There are writing direction modifiers. In the case of the example, left-to-right writing direction is used. Just add a right-to-left writing direction modifier at the beginning of the text and depending on the version of the API / browser, it will look correctly reversed ?

'\u202e' is called right to left override, it is the strongest version of the right to left marker.

See this explanation by w3.org

const text = 'Hello world??????????'
console.log('\u202e' + text)

const text = 'Hello world??????????'
let original = document.getElementById('original')
original.appendChild(document.createTextNode(text))
let result = document.getElementById('result')
result.appendChild(document.createTextNode('\u202e' + text))
body {
  font-family: sans-serif
}
<p id="original"></p>
<p id="result"></p>

Solution 4:[4]

I know! I'll use RegExp. What could go wrong? (Answer left as an exercise for the reader.)

const text = 'Hello world??????????';

const reversed = text.match(/.(\u200d.)*/gu).reverse().join('');

console.log(reversed);

Solution 5:[5]

Alternative solution would be to use runes library, small but effective solution:

https://github.com/dotcypress/runes

const runes = require('runes')

// String.substring
'???????a'.substring(1) => '???????a'

// Runes
runes.substr('???????a', 1) => 'a'

runes('12???????3??').reverse().join(); 
// results in: "??3???????21"

Solution 6:[6]

You don't just have trouble with emoji, but also with other combining characters. These things that feel like individual letters but are actually one-or-more unicode characters are called "extended grapheme clusters".

Breaking a string into these clusters is tricky (for example see these unicode docs). I would not rely on implementing it myself but use an existing library. Google pointed me at the grapheme-splitter library. The docs for this library contain some nice examples that will trip up most implementations:

Using this you should be able to write:

var splitter = new GraphemeSplitter();
var graphemes = splitter.splitGraphemes(string);
var reversed = graphemes.reverse().join('');

ASIDE: For visitors from the future, or those willing to live on the bleeding edge:

There is a proposal to add a grapheme segmenter to the javascript standard. (It actually provides other segmenting options too). It is in stage 3 review for acceptance at the moment and is currently implemented in JSC and V8 (see https://github.com/tc39/proposal-intl-segmenter/issues/114).

Using this the code would look like:

var segmenter = new Intl.Segmenter("en", {granularity: "grapheme"})
var segment_iterator = segmenter.segment(string)
var graphemes = []
for (let {segment} of segment_iterator) {
    graphemes.push(segment)
}
var reversed = graphemes.reverse().join('');

You can probably make this neater if you know more modern javascript than me...

There is an implementation here - but I don't know what it requires.

Note: This points out a fun issue that other answers haven't addressed yet. Segmentation can depend upon the locale that you are using - not just the characters in the string.

Solution 7:[7]

I just decided to do it for fun, was a good challenge. Not sure it's correct in all cases, so use at your own risk, but here it is:

function run() {
    const text = 'Hello world??????????';
    const newText = reverseText(text);
    console.log(newText);
}

function reverseText(text) {
    // first, create an array of characters
    let textArray = [...text];
    let lastCharConnector = false;
    textArray = textArray.reduce((acc, char, index) => {
        if (char.charCodeAt(0) === 8205) {
            const lastChar = acc[acc.length-1];
            if (Array.isArray(lastChar)) {
                lastChar.push(char);
            } else {
                acc[acc.length-1] = [lastChar, char];
            }
            lastCharConnector = true;
        } else if (lastCharConnector) {
            acc[acc.length-1].push(char);
            lastCharConnector = false;
        } else {
            acc.push(char);
            lastCharConnector = false;
        }
        return acc;
    }, []);
    
    console.log('initial text array', textArray);
    textArray = textArray.reverse();
    console.log('reversed text array', textArray);

    textArray = textArray.map((item) => {
        if (Array.isArray(item)) {
            return item.join('');
        } else {
            return item;
        }
    });

    return textArray.join('');
}

run();

Solution 8:[8]

You can use:

yourstring.split('').reverse().join('')

It should turn your string into a list, reverse it then make it a string again.

Solution 9:[9]

const text = 'Hello world??????????';

const reversed = text.split('').reverse().join('');

console.log(reversed);

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 idmean
Solution 2 theEpsilon
Solution 3
Solution 4 Neil
Solution 5
Solution 6 Inkling
Solution 7 TKoL
Solution 8 omdha0
Solution 9 asfaqe hussain