'Single byte, Double byte, triple byte characters how to find which one is entered by user

While working on a multilingual site (with Japanese and Chinese languages ) where users were allowed to enter characters in regional language. I had this requirement to validate user inputs based on memory taken by each character as character can be single byte, double byte or triple byte.

I used following solution for this as mentioned in the answer.



Solution 1:[1]

Characters can be single byte ,double byte, triple byte and so on. Single byte follows in a particular range. Same thing is true for other characters. Based on this I have created following functions that will calculate the size of a string on the basis of memory

function getByteLength(normal_val) {
    // Force string type
    normal_val = String(normal_val);

    var byteLen = 0;
    for (var i = 0; i < normal_val.length; i++) {
        var c = normal_val.charCodeAt(i);
        byteLen +=  c < (1 <<  7) ? 1 :
                c < (1 << 11) ? 2 :
                c < (1 << 16) ? 3 :
                c < (1 << 21) ? 4 :
                c < (1 << 26) ? 5 :
                c < (1 << 31) ? 6 : Number.NaN;
     }
     return byteLen;
} 

So above function can be modified to find out whether a function is single byte or multi-bytes.

Following js fiddle determines the size of entered text in terms of memory.

http://jsfiddle.net/paraselixir/d83oaa3v/5/

so if string has x characters and memory size is y so if x === y then all characters are single bytes if 2*x === y then all characters are double bytes otherwise string is combination of single and double/multi bytes.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 paraS elixiR