'Slow MongoDB aggregate with $sort on _id field

I have a records collection with the following indexes:

{"_id":1}
{"car.make":1,"city":1,"car.mileage":1}

And performing the following query:

db.records.aggregate([
    {
        "$match":{
            "car.mileage":{"$in":[1000,2000,3000,4000]},
            "car.make":"Honda",
            "city":{"$in":["Miami","San Francisco","New York", "Chigaco", "Seattle", "Boston"]}
        }
    },
    {
        "$sort":{"_id":-1}
    }
])

The query without the $sort clause finished in a few milliseconds but adding the $sort clause makes it takes around 2 minutes. This query should return around 40 documents from a collection of 6m documents. Any clues about what could cause this huge difference in query time?

After additional testing, this problem goes away by sorting on a different field like creation_date even if creation_date is not indexed. Any ideas why the _id field would perform so much worse than the unindexed creation_date field in this aggregation?



Solution 1:[1]

I ran into the same problem today and I'm speculating here but I believe in this case sorting by _id sorts all the entries in the collection before other operations (I said speculating because if you omit the $match clause and keep only the $sort clause even then you get your data in milliseconds).

The workaround that helped me was projection. If you use a $project clause between $match and $sort then you will get your data in milliseconds again. So you can either use fields like creation_date or if you must use _id then use $project before sorting it.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 codebuilder123