'mysql order by collation

I want to sort the results of MySQL based on the best match by using collate

query => zeķ

want to show

Vīriešu zeķes
Sieviešu zeķes
Daivinga zeķes
.
.
.
Mācību līdzekļi

I mean show all characters match (zek, ZEK, zeķ, etc) but sort by character user entered (zeķ)

I did tests: BINARY in order by => no success

used ( order by name LIKE 'zeķ' collate utf8mb4_general_ci) different colloate => no success

I get

1273 - Unknown collation: 'utf8mb4_unicode_cs'

for xxx_unicode_cs & xxx__unicode_bin can't change the setting of MySQL on the server (don't have access, host will not do it)

MySQL version is 5.7

result of SHOW CHARACTER SET

getting this result

big5_chinese_ci
dec8_swedish_ci
cp850_general_ci
hp8_english_ci
koi8r_general_ci
latin1_swedish_ci
latin2_general_ci
swe7_swedish_ci
ascii_general_ci
ujis_japanese_ci
sjis_japanese_ci
hebrew_general_ci
tis620_thai_ci
euckr_korean_ci
koi8u_general_ci
gb2312_chinese_ci
greek_general_ci
cp1250_general_ci
gbk_chinese_ci
latin5_turkish_ci
armscii8_general_ci
utf8_general_ci
ucs2_general_ci
cp866_general_ci
keybcs2_general_ci
macce_general_ci
macroman_general_ci
cp852_general_ci
latin7_general_ci
utf8mb4_general_ci
cp1251_general_ci
utf16_general_ci
utf16le_general_ci
cp1256_general_ci
cp1257_general_ci
utf32_general_ci
binary
geostd8_general_ci
cp932_japanese_ci
eucjpms_japanese_ci

using UNION made my query very slow, but result was fine



Solution 1:[1]

show all characters match (zek, ZEK, ze?, etc)

LIKE is anchored at both ends, hence LIKE 'ze?' is the same as = 'ze?'; instead do any of these

LIKE '%ze?%'
LIKE '%zek%'
LIKE '%ZEK%'
(etc)

with a suitable _ci collation (for "Case Insensitive"). Note that in all utf8/utf8mb4 utf8%ci collations (except latvian), K=k=?=? . Ref: http://mysql.rjweb.org/utf8mb4_collations.html . For utf8mb4_latvian_ci, k-cedilla is treated as a letter between "K" an "L"

but sort by character user entered (ze?)

This gets trickier.

ORDER BY col LIKE '%ze?' COLLATE utf8mb4_bin DESC,
         ((anything else to sort by))

That forces an exact match for accents/caps, which turns into 0 (first) for an exact match. The second part of the ORDER BY is optional; it could be some other thing to sort by (eg date or the complete column).

Preferred collation for 5.7

utf8_unicode_520_ci / utf8mb4_unicode_520_ci

For 8.0: utf8mb4_0900_ai_ci

Notes

  • Most of my answer works equally well for "utf8", "utf8mb3", or "utf8mb4" with suitable name changes in CHARACTER SET and COLLATION/COLLATE,

  • The first part of a COLLATION name matches the CHARACTER SET name.

  • utf8mb4_0900_ai_ci and utf8mb4_0900_as_ci refer to Case Insensitive and either Accent Insensitive or Sensitive.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1