'What is wrong with this inner join sql query

I want to export some data from the DB.

Basically what I want to say is this:

1- Select mbr_name from the members table

2- Choose the ones that exist at the course_registration table (based on mbr_id)

3- Join the course_registration ids with course_comments table

Then I need to apply these WHERE condtions as well:

1- Make sure that crr_status at course_registration table is set to completed

2- Make sure that crr_ts at course_registration table is between "2021-03-07 00:00:00" AND "2022-03-17 00:00:00"

3- Make sure that crm_confirmation from course_comments table is set to accept

So I tried my best and wrote this:

SELECT members.mbr_name
FROM members
INNER JOIN course_registration AS udt ON members.mbr_id = udt.crr_mbr_id 
INNER JOIN course_comments AS dot ON udt.crr_cor_id = dot.crm_reference_id
WHERE udt.crr_status = "completed" AND udt.crr_ts >= "2021-03-07 00:00:00" AND udt.crr_ts < "2022-03-17 00:00:00"
AND dot.crm_confirmation = "accept";

But this will give wrong data somehow.

The actual number of members that have all these conditions are 12K but this query gives me 120K results which is obviously wrong!

So what's going wrong here? How can I solve this issue?


UPDATE:

Here are the keys of each table:

members (mbr_id (PK), mbr_name) 
course_registration (crr_id (PK), crr_mbr_id (FK), crr_cor_id (FK), crr_status)
course_comments (crm_id (PK), crm_reference_id (FK), crm_confirmation)


Solution 1:[1]

You have a so-called cardinality problem. JOINs can, when multiple rows on the one table match a single row in the other table, cause the result set to have multiple rows. Your JOIN as written will generate many rows: members x courses x comments. That's what JOIN does.

It looks like you want exactly one row in your resultset for each member who ...

  • has completed one or more courses meeting your criterion.
  • has submitted one or more comments.

So let's start with a subquery. It gives the mbr_id values for members who have submitted one or more comments on one or more courses that meet your criteria.

        SELECT udt.crr_mbr_id
          FROM course_registration udt
          JOIN course_comments dot ON  udt.crr_cor_id = dot.crm_reference_id
         WHERE udt.crr_status = "completed"
           AND udt.crr_ts >= "2021-03-07 00:00:00"
           AND udt.crr_ts < "2022-03-17 00:00:00"
           AND dot.crm_confirmation = "accept"
         GROUP BY udt.mbr_id

You use the results of that subquery to find your members. The final query is

SELECT members.mbr_name
  FROM members
 WHERE members.mbr_id IN (
        SELECT udt.crr_mbr_id
          FROM course_registration udt
          JOIN course_comments dot ON  udt.crr_cor_id = dot.crm_reference_id
         WHERE udt.crr_status = "completed"
           AND udt.crr_ts >= "2021-03-07 00:00:00"
           AND udt.crr_ts < "2022-03-17 00:00:00"
           AND dot.crm_confirmation = "accept"
         GROUP BY udt.mbr_id )

Solution 2:[2]

As you only want to select Member name you can try as below if this gives required result

select m.mbr_name
  from Members m
  where Exists ( select 1 from Course_Registration cr 
                               join Course_Comments cm on cr.crr_cor_id = cm.crm_reference_id
                  where cr.crr_mbr_id = m.mbr_id
                    And cr.crr_status = "completed" AND cr.crr_ts >= "2021-03-07 00:00:00" AND cr.crr_ts < "2022-03-17 00:00:00"
                    AND cr.crm_confirmation = "accept";
                 );


Solution 3:[3]

My first guess, without knowing the context, is that:

  • a member can register to one or more courses,
  • each course can have one or more comments.

If this is the case, you are getting way more tuples due to redundancy. In that case you just need to stick a DISTINCT right after your first SELECT.

Furthermore, since the JOIN is the most resource-expensive operation in sql, I would first filter the data and then leave any join as the last operation to improve efficiency. Something like this:

SELECT 
    members.mbr_name 
FROM
    (
    SELECT DISTINCT
        crm_reference_id
    FROM 
        course_comments
    WHERE 
        crm_confirmation = 'accept'
    ) accepted_comments
INNER JOIN 
    (
    SELECT DISTINCT
        crr_mbr_id,
        crr_cor_id
    FROM 
        course_registration
    WHERE
        crr_status = 'completed'
    AND
        crr_ts BETWEEN '2021-03-07 00:00:00' AND '2022-03-17 00:00:00'
    ) completed_courses 
ON 
    accepted_comments.crm_reference_id = completed_courses.crr_cor_id
INNER JOIN 
    members 
ON 
    members.mbr_id = completed_courses.crr_mbr_id

Solution 4:[4]

I would start at the registration FIRST instead of the members. By getting a DISTINCT list of members signing up for a course, you have a smaller subset. From that too, joining to the comments for just those accepted gives you a final list.

Once you have those two, join back to members to get the name. I included the member ID as well as the name because what if you have two or more "John" or "Karen" names in the registration. At least you have the ID that confirms the unique students.

select
        m.mbr_name,
        m.mbr_id
    from
        ( select distinct
                cr.crr_mbr_id
            from
                course_registration cr
                    JOIN course_comments cc 
                        ON cr.crr_cor_id = cc.crm_reference_id
                        AND cc.crm_confirmation = 'accept'
            WHERE 
                    cr.crr_status = 'completed'
                AND cr.crr_ts >= '2021-03-07' 
                AND cr.crr_ts < '2022-03-17' ) PQ
        JOIN members m
            ON PQ.crr_mbr_id = m.mbr_id 

Solution 5:[5]

Try this:

SELECT *
FROM members M
INNER JOIN course_registration CR
ON CR.crr_mbr_id = M.mbr_id
AND CR.crr_status = 'completed'
AND CR.crr_ts BETWEEN '2021-03-07 00:00:00' AND '2022-03-17 00:00:00'
WHERE EXISTS(
    SELECT * FROM course_comments CC
    WHERE CC.crm_confirmation = 'accept'
    AND CC.crm_reference_id = CR.crr_cor_id
)
ORDER BY M.mbr_id;

Solution 6:[6]

Try using this and if it not works then try using 'between' for date field (crr_ts).

select mbr.mbr_name from
(
select * from course_registration AS udt
INNER JOIN course_comments AS dot ON udt.crr_cor_id = dot.crm_reference_id
where dot.crm_confirmation = "accept" AND udt.crr_status = "completed" AND udt.crr_ts >= "2021-03-07 00:00:00" AND udt.crr_ts < "2022-03-17 00:00:00"
)x
INNER JOIN  members mbr on mbr.mbr_id = x.crr_mbr_id

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Ramesh
Solution 3 lemon
Solution 4 DRapp
Solution 5 memite7760
Solution 6