'How to find and delete all duplicates from SQL Server database

I'm new to SQL in general and I need to delete all duplicates in a given database.

For the moment, I use this DB to experiment some things.

The table currently looks like this :

table content

I know I can find all duplicates using this query :

SELECT COUNT(*) AS NBR_DOUBLES, Name, Owner
FROM dbo.animals
GROUP BY Name, Owner
HAVING COUNT(*) > 1

but I have a lot of trouble finding an adapted and updated solution to not only find all the duplicates, but also delete them all, only leaving one of each.

Thanks a lot for taking some of your time to help me.



Solution 1:[1]

;WITH numbered AS (
    SELECT ROW_NUMBER() OVER(PARTITION BY Name, Owner ORDER BY Name, Owner) AS _dupe_num 
    FROM dbo.Animals
)
DELETE FROM numbered WHERE _dupe_num > 1;

This will delete all but one of each occurance with the same Name & Owner, if you need it to be more specific you should extend the PARTITION BY clause. If you want it to take in account the entire record you should add all your fields.

The record left behind is currently random, since it seems you do not have any field to have any sort of ordering on.

Solution 2:[2]

What you want to do is use a projection that numbers each record within a given duplicate set. You can do that with a Windowing Function, like this:

SELECT Name, Owner
   ,Row_Number() OVER ( PARTITION BY Name, Owner ORDER BY Name, Owner, Birth) AS RowNum
FROM dbo.animals
ORDER BY Name, Owner

This should give you results like this:

Name             Owner       RowNum
Ecstasy          Sacha         1
Ecstasy          Sacha         2
Ecstasy          Sacha         3
Gremlin          Max           1
Gremlin          Max           2
Gremlin          Max           3
Outch            Max           1
Outch            Max           2
Outch            Max           3

Now you want to convert this to a DELETE statement that has a WHERE clause targeting rows with RowNum > 1. The way to use a windowing function with a DELETE is to first include the windowing function as part of a common table expression (CTE), like this:

WITH dupes AS
(
    SELECT Name, Owner, 
       Row_Number() OVER ( PARTITION BY Name, Owner ORDER BY Name, Owner, Birth) AS RowNum
   FROM dbo.animals
)
DELETE FROM dupes WHERE RowNum > 1;

This will delete later duplicates, but leave row #1 for each group intact. The only trick now is to make sure row #1 is the correct row, since not all of your duplicates have the same values for the Birth or Death columns. This is the reason I included the Birth column in the windowing function, while other answers (so far) have not. You need to decide if you want to keep the oldest animal or the youngest, and optionally change the Birth order in the OVER clause to match your needs.

Solution 3:[3]

Use CTE. I will show you a sample :

Create table #Table1(Field1 varchar(100));

Insert into #Table1 values
('a'),('b'),('f'),('g'),('a'),('b');

Select * from #Table1;


WITH CTE AS(
   SELECT Field1,
       RN = ROW_NUMBER()OVER(PARTITION BY Field1 ORDER BY Field1)
   FROM #Table1
)
--SELECT * FROM CTE WHERE RN > 1
DELETE FROM CTE WHERE RN > 1

What I am doing is, numbering the rows. If there are duplicates based on PARTITION BY columns, it will be numbered sequentially, else 1.

Then delete those records whose count is greater than 1.

I won't spoon feed you solution hence you will have to play with PARTITION BY to reach your output

output :

Select * from #Table1;
Field1
---------
a
b
f
g
a
b

/*with cte as (...) SELECT * FROM CTE;*/

Field1  RN   
------- -----
a       1
a       2
b       1
b       2
f       1
g       1

Solution 4:[4]

if NBR_DOUBLES had an ID field, I believe you could use this;

DELETE FROM NBR_DOUBLES WHERE ID IN
(
SELECT MAX(ID)
FROM dbo.animals
GROUP BY Name, Owner
HAVING COUNT(*) > 1
)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Jens
Solution 2
Solution 3
Solution 4 pi511