'Find rows with multiple duplicate fields with Active Record, Rails & Postgres
What is the best way to find records with duplicate values across multiple columns using Postgres, and Activerecord?
I found this solution here:
User.find(:all, :group => [:first, :email], :having => "count(*) > 1" )
But it doesn't seem to work with postgres. I'm getting this error:
PG::GroupingError: ERROR: column "parts.id" must appear in the GROUP BY clause or be used in an aggregate function
Solution 1:[1]
Tested & Working Version
User.select(:first,:email).group(:first,:email).having("count(*) > 1")
Also, this is a little unrelated but handy. If you want to see how times each combination was found, put .size at the end:
User.select(:first,:email).group(:first,:email).having("count(*) > 1").size
and you'll get a result set back that looks like this:
{[nil, nil]=>512,
["Joe", "[email protected]"]=>23,
["Jim", "[email protected]"]=>36,
["John", "[email protected]"]=>21}
Thought that was pretty cool and hadn't seen it before.
Credit to Taryn, this is just a tweaked version of her answer.
Solution 2:[2]
That error occurs because POSTGRES requires you to put grouping columns in the SELECT clause.
try:
User.select(:first,:email).group(:first,:email).having("count(*) > 1").all
(note: not tested, you may need to tweak it)
EDITED to remove id column
Solution 3:[3]
If you need the full models, try the following (based on @newUserNameHere's answer).
User.where(email: User.select(:email).group(:email).having("count(*) > 1").select(:email))
This will return the rows where the email address of the row is not unique.
I'm not aware of a way to do this over multiple attributes.
Solution 4:[4]
Get all duplicates with a single query if you use PostgreSQL:
def duplicated_users
duplicated_ids = User
.group(:first, :email)
.having("COUNT(*) > 1")
.select('unnest((array_agg("id"))[2:])')
User.where(id: duplicated_ids)
end
irb> duplicated_users
Solution 5:[5]
Works well in raw SQL:
# select array_agg(id) from attendances group by event_id, user_id having count(*) > 1;
array_agg
---------------
{3712,3711}
{8762,8763}
{7421,7420}
{13478,13477}
{15494,15493}
Solution 6:[6]
Building on @itsnikolay 's answer above but making a method that you can pass any ActiveRecord scope to
#pass in a scope, and list of columns to group by
# map(&:dupe_ids) to see your list
def duplicate_row_ids(ar_scope, attrs)
ar_scope
.group(attrs)
.having("COUNT(*) > 1")
.select('array_agg("id") as dupe_ids')
end
#initial scope to narrow where you want to look for dupes
ar_scope = ProductReviews.where( product_id: "194e676b-741e-4143-a0ce-10cf268290bb", status: "Rejected")
#pass the scope, and list of columns to group by
results = duplicate_row_ids(ar_scope, [:nickname, :overall_rating, :source, :product_id, :headline, :status])
#get your list
id_pairs = results.map &:dupe_ids
#each entry is an array
#then go through your pairs and take action
Solution 7:[7]
Based on the answer above by @newUserNameHere I believe the right way to show the count for each is
res = User.select('first, email, count(1)').group(:first,:email).having('count(1) > 1')
res.each {|r| puts r.attributes } ; nil
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Jeremie Ges |
Solution 2 | |
Solution 3 | Ben Aubin |
Solution 4 | itsnikolay |
Solution 5 | Dorian |
Solution 6 | J_McCaffrey |
Solution 7 | Community |