'How to find count of duplicate values as a percentage in SQL?

I have eight columns in a table(birth_Record) of a hospital record. The record is considered duplicate if the below three columns are the same

DATE_AND_TIME Baby_Name Mother_Name
(date and time when a baby was born)

I have to find a query which returns the number of duplicates in the last 3 months and also gives the percentage of the duplicate records out of the total records in last 3 months.

I have written the below query is it correct?

SELECT COUNT(*) AS quantities,
       COUNT(Baby_Name) * 100 / (SELECT COUNT(Baby_Name) FROM birth_Record) AS percentage
FROM birth_Record
WHERE DATE_AND_TIME >= DATEADD(M, -3, GETDATE())
GROUP BY Baby_Name, DATE_AND_TIME, Mother_Name
HAVING COUNT(*) > 1;
sql


Solution 1:[1]

I don't have your data so I made my test table:

create table test1 (col_1 varchar(2),col_2 varchar(2),col_3 varchar(2),col_4 float, col_5 float, col_6 date);
insert into test1 values ('a','b','c',1,2,'2022-05-20 10:34:09');
insert into test1 values ('a','b','c',6,9,'2022-04-10 10:34:09');
insert into test1 values ('k','l','m',1,2,'2022-04-20 10:34:09');
insert into test1 values ('k','l','m',11,2,'2022-01-20 10:34:09');
insert into test1 values ('k','l','m',1,2,'2022-02-20 10:34:09');
insert into test1 values ('k','l','c',7,0,'2022-03-20 10:34:09');

Here we are counting duplicates where only col_1,col_2 and col_3 are relevant with a window function and then calculating the percent of that count out of total row count in the last 3 months:

select col_1,col_2,col_3,col_4,col_5, count(*)over (partition by col_1,col_2,col_3) dupl_number,
       round(cast(count(*)over (partition by col_1,col_2,col_3) as float)/cast((count(*)over())as float)*100,2) dup_percentage
from test1
where col_6>DATEADD(MONTH, -3, CAST( GETDATE() AS Date ))

In your case col_1,col_2 and col_3 would be: DATE_AND_TIME Baby_Name Mother_Name in the count(*)over (partition by col_1,col_2,col_3) and col_4, col_5 would be some irrelevant columns, if you leave col_4 and col_5 out and write distinct after select you would get: 1 row per distinct col_1,col_2,col_3 value with its count and percentage of total counts:

select distinct col_1,col_2,col_3, count(*)over (partition by col_1,col_2,col_3) dupl_number,
       round(cast(count(*)over (partition by col_1,col_2,col_3) as float)/cast((count(*)over())as float)*100,2) dup_percentage
from test1
where col_6>DATEADD(MONTH, -3, CAST( GETDATE() AS Date ))

dbfiddle

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1