'Cut an arabic string

I have a string in the arabic language like:

على احمد يوسف

Now I need to cut this string and output it like:

...على احمد يو

I tried this function:

function short_name($str, $limit) {
    if ($limit < 3) {
        $limit = 3;
    }

    if (strlen($str) > $limit) {
        if (preg_match('/\p{Arabic}/u', $str)) {
            return substr($str, 0, $limit - 3) . '...';
        }
        else {
            return '...'.substr($str, 0, $limit - 3);
        }
    }
    else {
        return $str;
    }
}

The problem is that sometimes it displays a symbol like this at the end of the string:

...�على احمد يو

Why does this happen?

php


Solution 1:[1]

The symbol displayed after the cut is the result of substr() cutting in the middle of a character, resulting in an invalid character.

You need to use Multibyte String Functions to handle arabic strings, such as mb_strlen() and mb_substr().

You also need to make sure the internal encoding for those functions is set to UTF-8. You can set this globally at the top of your script:

mb_internal_encoding('UTF-8');

Which leads to this:

  • strlen('??? ???? ????') returns 24, the size in octets
  • mb_strlen('??? ???? ????') returns 13, the size in characters

Note that mb_strlen('??? ???? ????') would also return 24 if the internal encoding was still set to the default ISO-8859-1.

Solution 2:[2]

Answer:

return '...'.mb_substr($str, 0, $limit - 3, "UTF-8"); // UTF-8 is optional

Background:

In ISO 8859-1 Arabic is not a 8-bit character set. The substr() calls the internal libc functions which work on sets of 8-bit chars. To display characters higher then 255 (Arabic, Cyclic, Korean, etc..) there are more bits needed to display that character, for example 16 or sometimes even 32-bits. You subtract 3*8-bits which will result in some undisplayable character in UTF-8. Especially if you're going to use a lot of multibyte strings, make sure you use the correct string functions such as mb_strlen()

Solution 3:[3]

Try this function;

 public static function shorten_arabic_text($text, $lenght)
        {
            mb_internal_encoding('UTF-8');
            $out = mb_strlen($text) > $lenght ? mb_substr($text, 0, $lenght) . " ..." : $text;
            return $out;
        }

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3 Ashnet