'How to find count of duplicate values as a percentage in SQL?
I have eight columns in a table(birth_Record) of a hospital record. The record is considered duplicate if the below three columns are the same
DATE_AND_TIME Baby_Name Mother_Name
(date and time
when a baby was born)
I have to find a query which returns the number of duplicates in the last 3 months and also gives the percentage of the duplicate records out of the total records in last 3 months.
I have written the below query is it correct?
SELECT COUNT(*) AS quantities,
COUNT(Baby_Name) * 100 / (SELECT COUNT(Baby_Name) FROM birth_Record) AS percentage
FROM birth_Record
WHERE DATE_AND_TIME >= DATEADD(M, -3, GETDATE())
GROUP BY Baby_Name, DATE_AND_TIME, Mother_Name
HAVING COUNT(*) > 1;
Solution 1:[1]
I don't have your data so I made my test table:
create table test1 (col_1 varchar(2),col_2 varchar(2),col_3 varchar(2),col_4 float, col_5 float, col_6 date);
insert into test1 values ('a','b','c',1,2,'2022-05-20 10:34:09');
insert into test1 values ('a','b','c',6,9,'2022-04-10 10:34:09');
insert into test1 values ('k','l','m',1,2,'2022-04-20 10:34:09');
insert into test1 values ('k','l','m',11,2,'2022-01-20 10:34:09');
insert into test1 values ('k','l','m',1,2,'2022-02-20 10:34:09');
insert into test1 values ('k','l','c',7,0,'2022-03-20 10:34:09');
Here we are counting duplicates where only col_1,col_2 and col_3 are relevant with a window function and then calculating the percent of that count out of total row count in the last 3 months:
select col_1,col_2,col_3,col_4,col_5, count(*)over (partition by col_1,col_2,col_3) dupl_number,
round(cast(count(*)over (partition by col_1,col_2,col_3) as float)/cast((count(*)over())as float)*100,2) dup_percentage
from test1
where col_6>DATEADD(MONTH, -3, CAST( GETDATE() AS Date ))
In your case col_1,col_2 and col_3 would be:
DATE_AND_TIME
Baby_Name
Mother_Name
in the count(*)over (partition by col_1,col_2,col_3)
and col_4, col_5 would be some irrelevant columns, if you leave col_4 and col_5 out and write distinct after select you would get: 1 row per distinct col_1,col_2,col_3 value with its count and percentage of total counts:
select distinct col_1,col_2,col_3, count(*)over (partition by col_1,col_2,col_3) dupl_number,
round(cast(count(*)over (partition by col_1,col_2,col_3) as float)/cast((count(*)over())as float)*100,2) dup_percentage
from test1
where col_6>DATEADD(MONTH, -3, CAST( GETDATE() AS Date ))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |