'Convert @ sign from byte in GSM 7-bit encoding to Java text

I have given a byte array [97, 98, 0, 99, 100] which is GSM 7-Bit encoded. This should be converted into ab@cd. When I tried to append this given array into a StringBuilder, I was not able to convert the @ sign.

Here is my code:

byte[] byteFinal ={97, 98, 0, 99, 100};
char ch;
StringBuilder str = new StringBuilder();
for(byte b : byteFinal){
    ch  = (char)b;
    System.out.println("ch:"+ch);
    str.append(ch);

}
System.out.println(str.toString());


Solution 1:[1]

Based on your comments in other answers, the problem is caused by missing handling of GSM 7-bit encoding.

You can treat GSM 7 Bit as a different character encoding, and you shouldn't use byte array of such encoding as-is and cast each byte to char. Casting byte to char only works iff your bytes are in UTF-8/ASCII or similar encoding, and the characters are less than code point 128.

It seems Java does not provide a built-in Charset for GSM 7-bit (else, you could have done something like String result = new String(byteFinal, GSM_7_BIT_CHARSET);).

You need to handcraft the logic, which looks something like https://mnujali.wordpress.com/2011/12/01/gsm-7-bit-encodingdecoding-used-for-sms-and-ussd-strings-java-code/:

static final char[] GSM7CHARS = {
        0x0040, 0x00A3, 0x0024, 0x00A5, 0x00E8, 0x00E9, 0x00F9, 0x00EC,
        0x00F2, 0x00E7, 0x000A, 0x00D8, 0x00F8, 0x000D, 0x00C5, 0x00E5,
        0x0394, 0x005F, 0x03A6, 0x0393, 0x039B, 0x03A9, 0x03A0, 0x03A8,
        0x03A3, 0x0398, 0x039E, 0x00A0, 0x00C6, 0x00E6, 0x00DF, 0x00C9,
        0x0020, 0x0021, 0x0022, 0x0023, 0x00A4, 0x0025, 0x0026, 0x0027,
        0x0028, 0x0029, 0x002A, 0x002B, 0x002C, 0x002D, 0x002E, 0x002F,
        0x0030, 0x0031, 0x0032, 0x0033, 0x0034, 0x0035, 0x0036, 0x0037,
        0x0038, 0x0039, 0x003A, 0x003B, 0x003C, 0x003D, 0x003E, 0x003F,
        0x00A1, 0x0041, 0x0042, 0x0043, 0x0044, 0x0045, 0x0046, 0x0047,
        0x0048, 0x0049, 0x004A, 0x004B, 0x004C, 0x004D, 0x004E, 0x004F,
        0x0050, 0x0051, 0x0052, 0x0053, 0x0054, 0x0055, 0x0056, 0x0057,
        0x0058, 0x0059, 0x005A, 0x00C4, 0x00D6, 0x00D1, 0x00DC, 0x00A7,
        0x00BF, 0x0061, 0x0062, 0x0063, 0x0064, 0x0065, 0x0066, 0x0067,
        0x0068, 0x0069, 0x006A, 0x006B, 0x006C, 0x006D, 0x006E, 0x006F,
        0x0070, 0x0071, 0x0072, 0x0073, 0x0074, 0x0075, 0x0076, 0x0077,
        0x0078, 0x0079, 0x007A, 0x00E4, 0x00F6, 0x00F1, 0x00FC, 0x00E0};

static final char[] ESCAPE = {
        0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
        0x0000, 0x0000, '\n'  , 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
        0x0000, 0x0000, 0x0000, 0x0000, '^'   , 0x0000, 0x0000, 0x0000,
        0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
        0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
        '{'   , '}'   , 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
        0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, '\\',
        0x0000, 0x0000, 0x0000, 0x0000, '['   , '~'   , ']'   , 0x0000,
        '|'   , 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
        0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
        0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
        0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
        0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x20AC, 0x0000, 0x0000,
        0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
        0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
        0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000};
        // or use -1 instead of 0x0000, depending on your preference

//...

byte[] byteFinal ={97, 98, 0, 99, 100};
StringBuilder sb = new StringBuilder();
boolean escape = false
for(byte b : byteFinal){
    if (b >= 0) {
        if (escape) {
            sb.append(ESCAPE[b] > 0 ? ESCAPE[b] : GSMCHARS[b]);
            escape = false;
        } else {
            if (b == 27) {  // escape
                escape = true;
            } else { 
                sb.append(GSM7CHARS[b]);
            }
        }
    }
}
System.out.println(sb.toString());

Update 1:

With some searching it seems GSM 7 bit encoding is a bit more complicated than what implemented above https://www.developershome.com/sms/gsmAlphabet.asp (Eg escaping etc)

However this at least give you idea on the need for handcrafting some lookup, instead of just casting the byte to char


Update 2:

It seems someone has implemented charset for GSM 7 bit: https://github.com/OpenSmpp/opensmpp/blob/master/charset/src/main/java/org/smpp/charset/Gsm7BitCharset.java

By using it, you can simply do something like String result = new String(byteFinal, GSM_7_BIT_CHARSET); without struggling with all those internals of GSM 7 bit

Solution 2:[2]

Change array to:

byte[] byteFinal ={97, 98, 64, 99, 100};

Ascii code of '@' is 64. Incidentally caret notation of NUL character (ascii code 0) is ^@ which seems to have confused you here.

Solution 3:[3]

You are using ascii values of characters in your byte array.

Here 64 corresponds to ascii value of '@' character that you are after.

Hence your array should be:

byte[] byteFinal ={97, 98, 64, 99, 100};
                           ^^

Looking at the wiki ascii value of 0 corresponds to null character.

Also to create String, you could just create string as below instead of using StringBuilder:

System.out.println(new String(byteFinal));

So all you need is two lines of code like:

byte[] byteFinal ={97, 98, 64, 99, 100};
System.out.println(new String(byteFinal));

Solution 4:[4]

Corresponding ASCII value of @ = 64 , Look Wikipedia

Rest of your code is fine!

byte[] byteFinal ={97, 98, 64, 99, 100};
        char ch;
        StringBuilder str = new StringBuilder();
        for(byte b : byteFinal){
            ch  = (char)b;
            System.out.println("ch:"+ch);
            str.append(ch);

        }
        System.out.println(str.toString());

Solution 5:[5]

You can also install the charset in the lib and use getBytes("SCGSM")

Solution 6:[6]

There is the library jCharset. When the library is on the class path it will be automatically added to the available charsets.

import java.io.UnsupportedEncodingException;

class Scratch {
    public static void main(String[] args) throws UnsupportedEncodingException {
        byte[] encoded = "something".getBytes("GSM7");
        System.out.println(new String(new byte[]{97, 98, 0, 99, 100}, "GSM7"));
    }
}

ab@cd Here are the Maven coordinates.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Gyapti Jain
Solution 3
Solution 4 Vishwa Ratna
Solution 5 loser8
Solution 6 k_o_