'How to convert fancy/artistic unicode text to ASCII?
I have a unicode string like "𝖙𝖍𝖚𝖌 𝖑𝖎𝖋𝖊" and would like to convert it to the ASCII form "thug life".
I know I can achieve this in Python by
import unidecode
print(unidecode.unidecode('𝖙𝖍𝖚𝖌 𝖑𝖎𝖋𝖊'))
// thug life
However, this would asciify also other unicode characters (such as Chinese/Japanese characters, emojis, accented characters, etc.), which I want to preserve.
Is there a way to detect these type of "artistic" unicode characters?
Some more examples:
𝓽𝓱𝓾𝓰 𝓵𝓲𝓯𝓮
𝓉𝒽𝓊𝑔 𝓁𝒾𝒻𝑒
𝕥𝕙𝕦𝕘 𝕝𝕚𝕗𝕖
thug life
Thanks for your help!
Solution 1:[1]
import unicodedata
strings = [
'???? ????',
'???? ????',
'???? ????',
'???? ????',
'???? ????']
for x in strings:
print(unicodedata.normalize( 'NFKC', x), x)
Output: .\62803325.py
thug life ???? ???? thug life ???? ???? thug life ???? ???? thug life ???? ???? thug life ???? ????
Resources:
unicodedata
— Unicode Database- Normalization forms for Unicode text
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | JosefZ |