'Find duplicate records in MongoDB
How would I find duplicate fields in a mongo collection.
I'd like to check if any of the "name" fields are duplicates.
{
"name" : "ksqn291",
"__v" : 0,
"_id" : ObjectId("540f346c3e7fc1054ffa7086"),
"channel" : "Sales"
}
Many thanks!
Solution 1:[1]
Use aggregation on name
and get name
with count > 1
:
db.collection.aggregate([
{"$group" : { "_id": "$name", "count": { "$sum": 1 } } },
{"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } },
{"$project": {"name" : "$_id", "_id" : 0} }
]);
To sort the results by most to least duplicates:
db.collection.aggregate([
{"$group" : { "_id": "$name", "count": { "$sum": 1 } } },
{"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } },
{"$sort": {"count" : -1} },
{"$project": {"name" : "$_id", "_id" : 0} }
]);
To use with another column name than "name", change "$name" to "$column_name"
Solution 2:[2]
You can find the list
of duplicate
names using the following aggregate
pipeline:
Group
all the records having similarname
.Match
thosegroups
having records greater than1
.- Then
group
again toproject
all the duplicate names as anarray
.
The Code:
db.collection.aggregate([
{$group:{"_id":"$name","name":{$first:"$name"},"count":{$sum:1}}},
{$match:{"count":{$gt:1}}},
{$project:{"name":1,"_id":0}},
{$group:{"_id":null,"duplicateNames":{$push:"$name"}}},
{$project:{"_id":0,"duplicateNames":1}}
])
o/p:
{ "duplicateNames" : [ "ksqn291", "ksqn29123213Test" ] }
Solution 3:[3]
The answer anhic gave can be very inefficient if you have a large database and the attribute name is present only in some of the documents.
To improve efficiency you can add a $match to the aggregation.
db.collection.aggregate(
{"$match": {"name" :{ "$ne" : null } } },
{"$group" : {"_id": "$name", "count": { "$sum": 1 } } },
{"$match": {"count" : {"$gt": 1} } },
{"$project": {"name" : "$_id", "_id" : 0} }
)
Solution 4:[4]
db.getCollection('orders').aggregate([
{$group: {
_id: {name: "$name"},
uniqueIds: {$addToSet: "$_id"},
count: {$sum: 1}
}
},
{$match: {
count: {"$gt": 1}
}
}
])
First Group Query the group according to the fields.
Then we check the unique Id and count it, If count is greater then 1 then the field is duplicate in the entire collection so that thing is to be handle by $match query.
Solution 5:[5]
Another option is to use $sortByCount
stage.
db.collection.aggregate([
{ $sortByCount: '$name' }
]
This is the combination of $group
& $sort
.
The
$sortByCount
stage is equivalent to the following$group
+$sort
sequence:{ $group: { _id: <expression>, count: { $sum: 1 } } }, { $sort: { count: -1 } }
Solution 6:[6]
UPDATE ====== Works everytime!
db.users.aggregate([
// Group by the key and compute the number of documents that match the key
{
$group: {
_id: "$username", // or if you want to use multiple fields _id: { a: "$FirstName", b: "$LastName" }
count: { $sum: 1 }
}
},
// Filter group having more than 1 item, which means that at least 2 documents have the same key
{
$match: {
count: { $gt: 1 }
}
}
])
==========
This also aggregation worked for me...
db.collection.aggregate([
{"$group" : { "_id": "$username", "count": { "$sum": 1 } } },
{"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } },
{"$project": {"username" : "$_id", "_id" : 0} }
]);
You can also try $sortByCount
db.collection.aggregate([
{ $sortByCount: '$username' }
]
Solution 7:[7]
If somebody is looking for a query for duplicates with an extra "$and" where clause, like "and where someOtherField is true"
The trick is to start with that other $match, because after the grouping you don't have all the data available anymore
// Do a first match before the grouping
{ $match: { "someOtherField": true }},
{ $group: {
_id: { name: "$name" },
count: { $sum: 1 }
}},
{ $match: { count: { $gte: 2 } }},
I searched for a very long time to find this notation, hope I can help somebody with the same problem
Solution 8:[8]
In case you need to see all duplicated rows:
db.collection.aggregate([
{"$group" : { "_id": "$name", "count": { "$sum": 1 },"data": { "$push": "$$ROOT" }}},
{"$unwind": "$data"}
{"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } },
]);
Solution 9:[9]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Prakash Harvani |
Solution 2 | BatScream |
Solution 3 | JuanÃn |
Solution 4 | Paul Rumkin |
Solution 5 | M. Justin |
Solution 6 | |
Solution 7 | |
Solution 8 | Andoctorey |
Solution 9 | Tanzeel |