'What is wrong with this inner join sql query
I want to export some data from the DB.
Basically what I want to say is this:
1- Select mbr_name
from the members
table
2- Choose the ones that exist at the course_registration
table (based on mbr_id
)
3- Join the course_registration
ids with course_comments
table
Then I need to apply these WHERE condtions as well:
1- Make sure that crr_status
at course_registration
table is set to completed
2- Make sure that crr_ts
at course_registration
table is between "2021-03-07 00:00:00"
AND "2022-03-17 00:00:00"
3- Make sure that crm_confirmation
from course_comments
table is set to accept
So I tried my best and wrote this:
SELECT members.mbr_name
FROM members
INNER JOIN course_registration AS udt ON members.mbr_id = udt.crr_mbr_id
INNER JOIN course_comments AS dot ON udt.crr_cor_id = dot.crm_reference_id
WHERE udt.crr_status = "completed" AND udt.crr_ts >= "2021-03-07 00:00:00" AND udt.crr_ts < "2022-03-17 00:00:00"
AND dot.crm_confirmation = "accept";
But this will give wrong data somehow.
The actual number of members
that have all these conditions are 12K but this query gives me 120K results which is obviously wrong!
So what's going wrong here? How can I solve this issue?
UPDATE:
Here are the keys of each table:
members (mbr_id (PK), mbr_name)
course_registration (crr_id (PK), crr_mbr_id (FK), crr_cor_id (FK), crr_status)
course_comments (crm_id (PK), crm_reference_id (FK), crm_confirmation)
Solution 1:[1]
You have a so-called cardinality problem. JOINs can, when multiple rows on the one table match a single row in the other table, cause the result set to have multiple rows. Your JOIN as written will generate many rows: members x courses x comments. That's what JOIN does.
It looks like you want exactly one row in your resultset for each member who ...
- has completed one or more courses meeting your criterion.
- has submitted one or more comments.
So let's start with a subquery. It gives the mbr_id
values for members who have submitted one or more comments on one or more courses that meet your criteria.
SELECT udt.crr_mbr_id
FROM course_registration udt
JOIN course_comments dot ON udt.crr_cor_id = dot.crm_reference_id
WHERE udt.crr_status = "completed"
AND udt.crr_ts >= "2021-03-07 00:00:00"
AND udt.crr_ts < "2022-03-17 00:00:00"
AND dot.crm_confirmation = "accept"
GROUP BY udt.mbr_id
You use the results of that subquery to find your members. The final query is
SELECT members.mbr_name
FROM members
WHERE members.mbr_id IN (
SELECT udt.crr_mbr_id
FROM course_registration udt
JOIN course_comments dot ON udt.crr_cor_id = dot.crm_reference_id
WHERE udt.crr_status = "completed"
AND udt.crr_ts >= "2021-03-07 00:00:00"
AND udt.crr_ts < "2022-03-17 00:00:00"
AND dot.crm_confirmation = "accept"
GROUP BY udt.mbr_id )
Solution 2:[2]
As you only want to select Member name you can try as below if this gives required result
select m.mbr_name
from Members m
where Exists ( select 1 from Course_Registration cr
join Course_Comments cm on cr.crr_cor_id = cm.crm_reference_id
where cr.crr_mbr_id = m.mbr_id
And cr.crr_status = "completed" AND cr.crr_ts >= "2021-03-07 00:00:00" AND cr.crr_ts < "2022-03-17 00:00:00"
AND cr.crm_confirmation = "accept";
);
Solution 3:[3]
My first guess, without knowing the context, is that:
- a member can register to one or more courses,
- each course can have one or more comments.
If this is the case, you are getting way more tuples due to redundancy. In that case you just need to stick a DISTINCT
right after your first SELECT.
Furthermore, since the JOIN
is the most resource-expensive operation in sql, I would first filter the data and then leave any join as the last operation to improve efficiency. Something like this:
SELECT
members.mbr_name
FROM
(
SELECT DISTINCT
crm_reference_id
FROM
course_comments
WHERE
crm_confirmation = 'accept'
) accepted_comments
INNER JOIN
(
SELECT DISTINCT
crr_mbr_id,
crr_cor_id
FROM
course_registration
WHERE
crr_status = 'completed'
AND
crr_ts BETWEEN '2021-03-07 00:00:00' AND '2022-03-17 00:00:00'
) completed_courses
ON
accepted_comments.crm_reference_id = completed_courses.crr_cor_id
INNER JOIN
members
ON
members.mbr_id = completed_courses.crr_mbr_id
Solution 4:[4]
I would start at the registration FIRST instead of the members. By getting a DISTINCT list of members signing up for a course, you have a smaller subset. From that too, joining to the comments for just those accepted gives you a final list.
Once you have those two, join back to members to get the name. I included the member ID as well as the name because what if you have two or more "John" or "Karen" names in the registration. At least you have the ID that confirms the unique students.
select
m.mbr_name,
m.mbr_id
from
( select distinct
cr.crr_mbr_id
from
course_registration cr
JOIN course_comments cc
ON cr.crr_cor_id = cc.crm_reference_id
AND cc.crm_confirmation = 'accept'
WHERE
cr.crr_status = 'completed'
AND cr.crr_ts >= '2021-03-07'
AND cr.crr_ts < '2022-03-17' ) PQ
JOIN members m
ON PQ.crr_mbr_id = m.mbr_id
Solution 5:[5]
Try this:
SELECT *
FROM members M
INNER JOIN course_registration CR
ON CR.crr_mbr_id = M.mbr_id
AND CR.crr_status = 'completed'
AND CR.crr_ts BETWEEN '2021-03-07 00:00:00' AND '2022-03-17 00:00:00'
WHERE EXISTS(
SELECT * FROM course_comments CC
WHERE CC.crm_confirmation = 'accept'
AND CC.crm_reference_id = CR.crr_cor_id
)
ORDER BY M.mbr_id;
Solution 6:[6]
Try using this and if it not works then try using 'between' for date field (crr_ts).
select mbr.mbr_name from
(
select * from course_registration AS udt
INNER JOIN course_comments AS dot ON udt.crr_cor_id = dot.crm_reference_id
where dot.crm_confirmation = "accept" AND udt.crr_status = "completed" AND udt.crr_ts >= "2021-03-07 00:00:00" AND udt.crr_ts < "2022-03-17 00:00:00"
)x
INNER JOIN members mbr on mbr.mbr_id = x.crr_mbr_id
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Ramesh |
Solution 3 | lemon |
Solution 4 | DRapp |
Solution 5 | memite7760 |
Solution 6 |