'Delete duplicates from a huge table in Postgresql
I have an unusual problem: I need to delete duplicate records from a table in Postgresql. As i have duplicate records so i dont have primary key and unique index in this table. The table conatins like 20million records and it has duplicate records in it. While i am trying the below query it is taking too long time.
'DELETE FROM temp a using temp b where a.recordid=b.recordid and a.ctid < b.ctid;'
So what should be a better approach to handle such huge table with no index in it? Appreciate for help.
Solution 1:[1]
if you have enough empty space, your can copy table without duplicates, then remove old table and rename new table
like this
INSERT INTO new_table
VALUES
SELECT
DISTINCT ON (column)
*
FROM old_table
ORDER BY column ASC
Solution 2:[2]
Use COPY TO
to dump the table.
Then Unix sort -u
to de-duplicate it.
Drop or truncate the table in Postgres, use COPY FROM
to read it back in.
Add a primary key column.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Alan Tishin |
Solution 2 | Andrew Lazarus |