'Performance penalty when using fluent interface
I am an absolute beginner when it comes to MongoDb. Just joined a new project where it is used, so I tried to do few experiments to get myself familiar. I got to an issue and I cannot find a reasonable justification.
On my local windows machine I have installed MongoDb as a service. I created a collection which gets loaded with following document:
public class UserAuditLog
{
public Guid Id { get; set; }
public long InternalId { get; set; }
public string StringId { get; set; }
public IEnumerable<AuditLog> AuditLogs { get; set; }
}
public class AuditLog
{
public DateTime OccuredAt { get; set; }
public int MethodId { get; set; }
public string MethodName { get; set; }
public bool IsSuccessful { get; set; }
public string AppClientId { get; set; }
}
There is an index on the StringId
(ascending).
In my fake scenario, each UserAuditLog
is preloaded with 50 entries in the AuditLogs
collection. The data is random (I use the cryptographic safe random RandomNumberGenerator
) for StringId
and when generating entries for AuditLog.OccuredAt
. My test collection gets loaded with approx 50mln UserAuditLog
documents.
I try to retrieve a single StringId
with aggregated data using:
IMongoCollection<UserAuditLog> collection = db.GetCollection<UserAuditLog>("auditlogs");
var group = new BsonDocument
{
{ "_id", "$_id" },
{ "AuditLogs", new BsonDocument { { "$push", "$auditLogs" } } },
};
var queryable = collection.AsQueryable();
var groupResult = await collection.Aggregate()
.Match(x => x.StringId == "<existingStringId>")
.Unwind(x => x.AuditLogs)
.Sort(Builders<BsonDocument>.Sort.Descending("auditLogs.occuredAt"))
.Skip(10)
.Limit(10)
.Group(group)
.FirstOrDefaultAsync(cancellationToken);
On my machine this is kind of long - on 50mln documents, it runs in over 1000ms (usually between 1300ms to 1600ms).
I was playing around and I found out that the underlying query, when executed against the db, runs always below 30ms. So I eventually did this (as a comparison):
PipelineDefinition<UserAuditLog, UserAuditLog> pipeline = new BsonDocument[]
{
new BsonDocument { { "$match", new BsonDocument("StringId", "<existingStringId>") } },
new BsonDocument { { "$unwind", "$AuditLogs" } },
new BsonDocument { { "$sort", new BsonDocument("auditLogs.occuredAd", 1) } },
new BsonDocument { { "$skip", 10 } },
new BsonDocument { { "$limit", 10 } },
new BsonDocument { { "$group", new BsonDocument
{
{ "_id", "$_id" },
{ "AuditLogs", new BsonDocument { { "$push", "$AuditLogs" } } }
} } }
};
var query = await collection.AggregateAsync<UserAuditLog>(pipeline);
var queryResult = await query .FirstOrDefaultAsync();
This query never run longer than 30ms. I compared the results - they seem identical. Also, before any run I was restarting the mongodb service in hope it will clear any cached results.
Why is that? Am I doing something wrong? I would appreciate any documentation on the subject.
Update 1
Following @dododo advice, I used the analyzer. Indeed, I found a minor difference - limit
was used twice on the fluent interface version - this is due to the use of FirstOrDefault
. I changed that to a ToList
just to try to make the queries identical. And indeed, they are now identical:
{
"aggregate" : "auditlogs50",
"pipeline" : [
{ "$match" : { "StringId" : "93345420576" } },
{ "$unwind" : "$AuditLogs" },
{ "$sort" : { "auditLogs.occuredAt" : 1 } },
{ "$skip" : 10 },
{ "$limit" : 10 },
{ "$group" : {
"_id" : "$_id",
"AuditLogs" : { "$push" : "$AuditLogs" } } }],
"cursor" : { },
"$db" : "test",
"lsid" : { "id" : CSUUID("fef7fdbd-f320-48ca-8faa-93f715fba753") } }
but the execution times did not change. Fluent version is still between 1300ms and 1600ms while BsonDocument is usually below 30ms. I even switched the order of execution (so I have a version when first I execute the fluent version and a version when I first execute the BsonDocument version), but there was no significant change.
Solution 1:[1]
There is no difference between typed way (your option 1) and a query created via raw MQL stages (your option 2) since in both of cases server consumes only query in raw form. If you see performance difference, then you should compare actually generated queries in both of these cases. See how to do it here.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | dododo |