'Cut an arabic string
I have a string in the arabic language like:
على احمد يوسف
Now I need to cut this string and output it like:
...على احمد يو
I tried this function:
function short_name($str, $limit) {
if ($limit < 3) {
$limit = 3;
}
if (strlen($str) > $limit) {
if (preg_match('/\p{Arabic}/u', $str)) {
return substr($str, 0, $limit - 3) . '...';
}
else {
return '...'.substr($str, 0, $limit - 3);
}
}
else {
return $str;
}
}
The problem is that sometimes it displays a symbol like this at the end of the string:
...�على احمد يو
Why does this happen?
Solution 1:[1]
The symbol displayed after the cut is the result of substr()
cutting in the middle of a character, resulting in an invalid character.
You need to use Multibyte String Functions to handle arabic strings, such as mb_strlen()
and mb_substr()
.
You also need to make sure the internal encoding for those functions is set to UTF-8
. You can set this globally at the top of your script:
mb_internal_encoding('UTF-8');
Which leads to this:
strlen('??? ???? ????')
returns 24, the size in octetsmb_strlen('??? ???? ????')
returns 13, the size in characters
Note that mb_strlen('??? ???? ????')
would also return 24 if the internal encoding was still set to the default ISO-8859-1
.
Solution 2:[2]
Answer:
return '...'.mb_substr($str, 0, $limit - 3, "UTF-8"); // UTF-8 is optional
Background:
In ISO 8859-1 Arabic is not a 8-bit character set. The substr()
calls the internal libc functions which work on sets of 8-bit chars. To display characters higher then 255 (Arabic, Cyclic, Korean, etc..) there are more bits needed to display that character, for example 16 or sometimes even 32-bits. You subtract 3*8-bits which will result in some undisplayable character in UTF-8. Especially if you're going to use a lot of multibyte strings, make sure you use the correct string functions such as mb_strlen()
Solution 3:[3]
Try this function;
public static function shorten_arabic_text($text, $lenght)
{
mb_internal_encoding('UTF-8');
$out = mb_strlen($text) > $lenght ? mb_substr($text, 0, $lenght) . " ..." : $text;
return $out;
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | |
Solution 3 | Ashnet |