'Fulltext index match string with period (.) mysql

I have a few column fulltext indexed and i'm testing some string to search. My db contains cars components so my researches could be for example "Engine 1.6". The problem is that when I use string with point (like 1.6) query returns no results.

Here's my variables

+--------------------------+----------------+
| ft_boolean_syntax        | + -><()~*:""&| |
+--------------------------+----------------+
| ft_max_word_len          | 84             |
+--------------------------+----------------+
| ft_min_word_len          | 4              |
+--------------------------+----------------+
| ft_query_expansion_limit | 20             |
+--------------------------+----------------+
| ft_stopword_file         | (built-in)     |
+--------------------------+----------------+

I don't know why but even if the ft_min_word_len is 4, a search like "Engine 24V" works. The query for matching is like this:

WHERE MATCH(sdescr,udescr) AGAINST ('+engine +1.6' IN BOOLEAN MODE)


Solution 1:[1]

I spend the last day figuring out this issue. The reason why this is happening is that by default, MySQL/MariaDB collations treat space(" "), periods("."), and commas(",") as punctuation. Long story short, collations "weight" characters to determine how to filter or sort them. The punctuations mentioned above are considered EOL or 'stopwords.'

We need to have MySQL/MariaDB treat those punctuations as characters rather than punctuations to solve this issue.

We are presented with three solutions in the MySQL documentation. The first one requires changing the source code and recompiling, which isn't a very viable option for me. The second and third options are good and aren't too hard to follow.

  • Modify a character set file: This requires no recompilation. The true_word_char() macro uses a “character type” table to distinguish letters and numbers from other characters. You can edit the contents of the array in one of the character set XML files to specify that '-' is a “letter.” Then use the given character set for your FULLTEXT indexes. For information about the array format, see Section 10.13.1, “Character Definition Arrays”.

First things first:

We need to know which character we're trying to fix. Take a look link below and find the HEX equivalent to the character you're trying to fix. In my case, it was 2E, the period. https://www.eso.org/~ndelmott/ascii.html

Now, we need to find the collation files in the database server.

  1. SSH into your server.
  2. Login into your MySQL/MariaDB: mysql -u root -p
  3. Run Show VARIABLES LIKE 'character_sets_dir'

The result should return a table with a value of a directory path. I was using docker, so mine came back as usr/share/mysql/charsets.

At this point, I opened a second terminal, but this is necessary.

Back in the server, outside of the MySQL/MariaDB command line:

  1. Navigate to the directory path the previous query returned. You'll find an Index.xml as well as other XML files.
  2. Follow the first step in the MySQL Documentation

NOTE: Before continuing the second step, open latin1.xml and look closely at the <map> nested in <lower> and <upper>. Find the HEX equivalent character to the one you want to fix, in my case, 2E. We can then map the correct spot in the <map> nested inside <ctype>.

  1. Continue to the second step in the MySQL Documentation
  2. After the changes, Restart your server.

Assign the User-defined Collation to our database/table/column.

All we need to do is assign our collation to our database, table, or column. In my case, I just needed to assign it to two columns, so I ran the following command: ALTER TABLE table_name MODIFY fulltext_column_one TEXT CHARACTER SET latin1 COLLATE latin1_fulltext_ci, MODIFY fulltext_column_two TEXT CHARACTER SET latin1 COLLATE latin1_fulltext_ci;

Here are some links that might be helpful: https://mariadb.com/kb/en/setting-character-sets-and-collations/ https://dev.mysql.com/doc/refman/8.0/en/charset-syntax.html

This should solve your problem if you don't have any existing data in the table.

If you do have existing data and you try to run the query above, you might have gotten an error similar to the one below:

SQLSTATE[22007]: Invalid datetime format: 1366 Incorrect string value: '\xE2\x80\x93 fr...' for column.

The issue here is due to attempting to convert a 4byte character into a 3byte character. To solve this, we need to convert our data from 4bytes to binary, then to 3bytes(latin1). For more info, check out this link.

Run the following query in the mysql/mariadb command line: UPDATE table_name SET fulltext_column = CONVERT(CAST(CONVERT(fulltext_column USING utf8) AS BINARY) USING latin1);

You'll need to convert the values of every column which are causing the issue. In my case, it was just one.

Then follow it with: ALTER TABLE table_name MODIFY fulltext_column_one TEXT CHARACTER SET latin1 COLLATE latin1_fulltext_ci, MODIFY fulltext_column_two TEXT CHARACTER SET latin1 COLLATE latin1_fulltext_ci;

We are done. We can now search a term with our character, and our database engine will match against it.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1