'Delete duplicates from a huge table in Postgresql

I have an unusual problem: I need to delete duplicate records from a table in Postgresql. As i have duplicate records so i dont have primary key and unique index in this table. The table conatins like 20million records and it has duplicate records in it. While i am trying the below query it is taking too long time.

'DELETE FROM temp a using temp b where a.recordid=b.recordid and a.ctid < b.ctid;'

So what should be a better approach to handle such huge table with no index in it? Appreciate for help.



Solution 1:[1]

if you have enough empty space, your can copy table without duplicates, then remove old table and rename new table

like this

INSERT INTO new_table
VALUES
SELECT 
  DISTINCT ON (column) 
  *
FROM old_table
ORDER BY column ASC

Solution 2:[2]

Use COPY TO to dump the table.

Then Unix sort -u to de-duplicate it.

Drop or truncate the table in Postgres, use COPY FROM to read it back in.

Add a primary key column.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Alan Tishin
Solution 2 Andrew Lazarus