'Find duplicate records in MongoDB

How would I find duplicate fields in a mongo collection.

I'd like to check if any of the "name" fields are duplicates.

{
    "name" : "ksqn291",
    "__v" : 0,
    "_id" : ObjectId("540f346c3e7fc1054ffa7086"),
    "channel" : "Sales"
}

Many thanks!



Solution 1:[1]

Use aggregation on name and get name with count > 1:

db.collection.aggregate([
    {"$group" : { "_id": "$name", "count": { "$sum": 1 } } },
    {"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } }, 
    {"$project": {"name" : "$_id", "_id" : 0} }
]);

To sort the results by most to least duplicates:

db.collection.aggregate([
    {"$group" : { "_id": "$name", "count": { "$sum": 1 } } },
    {"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } }, 
    {"$sort": {"count" : -1} },
    {"$project": {"name" : "$_id", "_id" : 0} }     
]);

To use with another column name than "name", change "$name" to "$column_name"

Solution 2:[2]

You can find the list of duplicate names using the following aggregate pipeline:

  • Group all the records having similar name.
  • Match those groups having records greater than 1.
  • Then group again to project all the duplicate names as an array.

The Code:

db.collection.aggregate([
{$group:{"_id":"$name","name":{$first:"$name"},"count":{$sum:1}}},
{$match:{"count":{$gt:1}}},
{$project:{"name":1,"_id":0}},
{$group:{"_id":null,"duplicateNames":{$push:"$name"}}},
{$project:{"_id":0,"duplicateNames":1}}
])

o/p:

{ "duplicateNames" : [ "ksqn291", "ksqn29123213Test" ] }

Solution 3:[3]

The answer anhic gave can be very inefficient if you have a large database and the attribute name is present only in some of the documents.

To improve efficiency you can add a $match to the aggregation.

db.collection.aggregate(
    {"$match": {"name" :{ "$ne" : null } } }, 
    {"$group" : {"_id": "$name", "count": { "$sum": 1 } } },
    {"$match": {"count" : {"$gt": 1} } }, 
    {"$project": {"name" : "$_id", "_id" : 0} }
)

Solution 4:[4]

db.getCollection('orders').aggregate([  
    {$group: { 
            _id: {name: "$name"},
            uniqueIds: {$addToSet: "$_id"},
            count: {$sum: 1}
        } 
    },
    {$match: { 
        count: {"$gt": 1}
        }
    }
])

First Group Query the group according to the fields.

Then we check the unique Id and count it, If count is greater then 1 then the field is duplicate in the entire collection so that thing is to be handle by $match query.

Solution 5:[5]

Another option is to use $sortByCount stage.

db.collection.aggregate([
  { $sortByCount: '$name' }
]

This is the combination of $group & $sort.

The $sortByCount stage is equivalent to the following $group + $sort sequence:

    { $group: { _id: <expression>, count: { $sum: 1 } } },
    { $sort: { count: -1 } }

Solution 6:[6]

UPDATE ====== Works everytime!

db.users.aggregate([
    // Group by the key and compute the number of documents that match the key
    {
        $group: {
            _id: "$username",  // or if you want to use multiple fields _id: { a: "$FirstName", b: "$LastName" }
            count: { $sum: 1 }
        }
    },
    // Filter group having more than 1 item, which means that at least 2 documents have the same key
    {
        $match: {
            count: { $gt: 1 }
        }
    }
])

==========

This also aggregation worked for me...

db.collection.aggregate([
    {"$group" : { "_id": "$username", "count": { "$sum": 1 } } },
    {"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } }, 
    {"$project": {"username" : "$_id", "_id" : 0} }
]);

You can also try $sortByCount

db.collection.aggregate([
  { $sortByCount: '$username' }
]

Solution 7:[7]

If somebody is looking for a query for duplicates with an extra "$and" where clause, like "and where someOtherField is true"

The trick is to start with that other $match, because after the grouping you don't have all the data available anymore

// Do a first match before the grouping
{ $match: { "someOtherField": true }},
{ $group: {
    _id: { name: "$name" },
    count: { $sum: 1 }
}},
{ $match: { count: { $gte: 2 } }},

I searched for a very long time to find this notation, hope I can help somebody with the same problem

Solution 8:[8]

In case you need to see all duplicated rows:

db.collection.aggregate([
     {"$group" : { "_id": "$name", "count": { "$sum": 1 },"data": { "$push": "$$ROOT" }}},
     {"$unwind": "$data"}
     {"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } }, 
]);

Solution 9:[9]

enter image description here

this is how we can achieve this in mongoDB compass

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Prakash Harvani
Solution 2 BatScream
Solution 3 Juanín
Solution 4 Paul Rumkin
Solution 5 M. Justin
Solution 6
Solution 7
Solution 8 Andoctorey
Solution 9 Tanzeel