'Incorrect string value error for unconventional characters

So I'm using a wrapper to fetch user data from instagram. I want to select the display names for users, and store them in a MYSQL database. I'm having issues inserting some of the display names, dealing with, specifically, an incorrect string value error:

Now, I've dealt with this issue before with accent marks, letters with umlauts, etc. The solution would be to change the collation to utf8_general_ci under the utf8 charset.

So as you can see, some of the display names I'm pulling have very unique characters that I'm not sure mySQL can recognize at all, i.e.:

ᛘ𝕰𝖆𝖗𝖙𝖍 𝕾𝖕𝖎𝖗𝖎𝖙𝖚𝖘𐂂®

So I receive:

Error Code: 1366. Incorrect string value: '\xF0\x9D\x99\x87\xF0\x9D...' for column 'dummy' at row 1

Here's my sql code

CREATE TABLE test_table(
    id INT AUTO_INCREMENT,
    dummy VARCHAR(255),
    PRIMARY KEY(id)
);
    
INSERT INTO test_table (dummy)
VALUES ('ᛘ𝕰𝖆𝖗𝖙𝖍 𝕾𝖕𝖎𝖗𝖎𝖙𝖚𝖘𐂂®');

Any thoughts on a proper charset + collation pair that can handle characters like this? Not sure where to look for a solution, so I come here to see if anyone dealt with this.

P.S., I've tried utf8mb4 charset with utf8mb4_unicode_ci and utf8mb4_bin collations as well.



Solution 1:[1]

The characters you show require the column use the utf8mb4 encoding. Currently it seems your column is defined with the utf8mb3 encoding.

The way MySQL uses the name "utf8" is complicated, as described in https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-utf8mb3.html:

Note

Historically, MySQL has used utf8 as an alias for utf8mb3; beginning with MySQL 8.0.28, utf8mb3 is used exclusively in the output of SHOW statements and in Information Schema tables when this character set is meant.

At some point in the future utf8 is expected to become a reference to utf8mb4. To avoid ambiguity about the meaning of utf8, consider specifying utf8mb4 explicitly for character set references instead of utf8.

You should also be aware that the utf8mb3 character set is deprecated and you should expect it to be removed in a future MySQL release. Please use utf8mb4 instead.

You may have tried to change your table in the following way:

ALTER TABLE test_table CHARSET=utf8mb4;

But that only changes the default character set, to be used if you add new columns to the table subsequently. It does not change any of the current columns. To do that:

ALTER TABLE test_table MODIFY COLUMN dummy VARCHAR(255) CHARACTER SET utf8mb4;

Or to convert all string or TEXT columns in a table in one statement:

ALTER TABLE test_table CONVERT TO CHARACTER SET utf8mb4;

Solution 2:[2]

That would be ? - L MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL L

It requires the utf8mb4 Character set to even represent it. "F0" is the clue; it is the first of 4 bytes in a 4-byte UTF-8 character. It cannot be represented in MySQL's "utf8". Collation is (mostly) irrelevant.

Most, not all, of the characters in ?????? ?????????® also need utf8mb4. They are "MATHEMATICAL BOLD FRAKTUR" letters.

(Meanwhile, Bill gives you more of an answer.)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Bill Karwin
Solution 2 Rick James