'Single byte, Double byte, triple byte characters how to find which one is entered by user
While working on a multilingual site (with Japanese and Chinese languages ) where users were allowed to enter characters in regional language. I had this requirement to validate user inputs based on memory taken by each character as character can be single byte, double byte or triple byte.
I used following solution for this as mentioned in the answer.
Solution 1:[1]
Characters can be single byte ,double byte, triple byte and so on. Single byte follows in a particular range. Same thing is true for other characters. Based on this I have created following functions that will calculate the size of a string on the basis of memory
function getByteLength(normal_val) {
// Force string type
normal_val = String(normal_val);
var byteLen = 0;
for (var i = 0; i < normal_val.length; i++) {
var c = normal_val.charCodeAt(i);
byteLen += c < (1 << 7) ? 1 :
c < (1 << 11) ? 2 :
c < (1 << 16) ? 3 :
c < (1 << 21) ? 4 :
c < (1 << 26) ? 5 :
c < (1 << 31) ? 6 : Number.NaN;
}
return byteLen;
}
So above function can be modified to find out whether a function is single byte or multi-bytes.
Following js fiddle determines the size of entered text in terms of memory.
http://jsfiddle.net/paraselixir/d83oaa3v/5/
so if string has x characters and memory size is y so if x === y then all characters are single bytes if 2*x === y then all characters are double bytes otherwise string is combination of single and double/multi bytes.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | paraS elixiR |