'How to join two tables using a comma-separated-list in the join field

I have two tables, categories and movies.

In movies table I have a column categories. That column consists of the categories that movie fits in. The categories are IDs separated by a comma.

Here's an example:

Table categories {
  -id-       -name-
  1          Action
  2          Comedy
  4          Drama
  5          Dance
}

Table movies {
  -id-       -categories-  (and some more columns ofc)
  1          2,4
  2          1,4
  4          3,5
}

Now to the actual question: Is it possible to perform a query that excludes the categories column from the movies table, and instead selects the matching categories from the categories table and returns them in an array? Like a join, but the problem is there are multiple categories separated by comma, is it possible to do some kind of regex?



Solution 1:[1]

Using comma separated lists in a database field is an anti-pattern and should be avoided at all costs.
Because it is a PITA to extract those comma separated values out agian in SQL.

Instead you should add a separate link table to represent the relationship between categories and movies, like so:

Table categories
  id integer auto_increment primary key
  name varchar(255)

Table movies
  id integer auto_increment primary key
  name varchar(255)

Table movie_cat
  movie_id integer foreign key references movies.id
  cat_id integer foreign key references categories.id
  primary key (movie_id, cat_id)

Now you can do

SELECT m.name as movie_title, GROUP_CONCAT(c.name) AS categories FROM movies m
INNER JOIN movie_cat mc ON (mc.movie_id = m.id)
INNER JOIN categories c ON (c.id = mc.cat_id)
GROUP BY m.id

Back to your question
Alternativly using your data you can do

SELECT m.name as movie_title
  , CONCAT(c1.name, if(c2.name IS NULL,'',', '), ifnull(c2.name,'')) as categories 
FROM movies m
LEFT JOIN categories c2 ON 
 (replace(substring(substring_index(m.categories, ',', 2),
  length(substring_index(m.categories, ',', 2 - 1)) + 1), ',', '') = c2.id)
INNER JOIN categories c1 ON 
 (replace(substring(substring_index(m.categories, ',', 1), 
  length(substring_index(m.categories, ',', 1 - 1)) + 1), ',', '') = c1.id)

Note that the last query only works if there are 2 or fewer categories per movie.

Solution 2:[2]

select
    m.id,
    group_concat(c.name)
from
    movies m
    join categories c on find_in_set(c.id, m.categories)
group by
    m.id

The output should be something like this:

Table movies {
  -id-       -categories-
  1          Comedy,Drama
  2          Action,Drama
  4          Other,Dance
}

Solution 3:[3]

Brad is right; normalisation is the solution. Normalisation exists to solve this problem. It should be covered pretty well in your MySQL book if it's worth its salt.


If you really insist, though, you can fake the direct join by cross-matching with FIND_IN_SET (which conveniently expects a comma-delimited string of items).

Now, MySQL can't return "an array" — that's what sets of results are for — but it can give you the category names separated by, say, a pipe (|):

SELECT
       `m`.`id`,
       `m`.`name`,
       GROUP_CONCAT(`c`.`name` SEPARATOR "|") AS `cats`
  FROM
       `movies`     AS `m`,
       `categories` AS `c`
 WHERE
       FIND_IN_SET(`c`.`id`, `m`.`categories`) != 0
 GROUP BY
       `m`.`id`;

Result:

id  "name"     "cats"
---------------------------------------------------
1   "Movie 1"  "Comedy|Drama"
2   "Movie 2"  "Action|Drama"
4   "Movie 4"  "Dance"

Solution 4:[4]

Try This

SELECT m.*, c.* FROM movies m 
RIGHT JOIN categories c on find_in_set(c.id, m.categories) 
GROUP BY m.id

Solution 5:[5]

This isn't directly answering your question but what you have in the movies table is really bad.

Instead of combining categories using comma, what you should be doing is to have each category on separate rows, eg:

Table movies {
  -id-       -categories-
  1          2
  1          4
  2          1
  2          4
  4          3
  4          5
}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3
Solution 4 Merrin K
Solution 5 ryanprayogo