'MongoDB - Get latest non-null field value from documents with timestamp
In the MongoDB collection I'm querying, each document represents some data for a parcel at a specific time. Every time I receive an update for a parcel, some fields may be updated (non-null value) and some others are not (null values).
To illustrate, consider this example. We received 3 data sets for a parcel:
/* 1 */
{
"parcelNum" : "CC123456789FR",
"datetime" : ISODate("2018-09-05T10:48:38.584Z"),
"field1" : "value1_1",
"field2" : "value2_1"
}
/* 2 */
{
"parcelNum" : "CC123456789FR",
"datetime" : ISODate("2018-09-05T10:48:40.566Z"),
"field1" : "value1_2",
"field2" : null
}
/* 3 */
{
"parcelNum" : "CC123456789FR",
"datetime" : ISODate("2018-09-05T10:48:42.777Z"),
"field1" : null,
"field2" : "value2_2"
}
How can I extract the latest non-null value, for all fields, considering the timestamp of the document they belong to?
Using the previous example, this is what I try to get:
{
"parcelNum" : "CC123456789FR",
"field1" : "value1_2",
"field2" : "value2_2"
}
I tried that kind of query but I can't find how to mix field values from multiple documents:
db.testDB.aggregate([
{$sort: { datetime: -1 }},
{$group: { _id: "$parcelNum",
field1: {$first: "$field1" },
field2: {$first: "$field2" }
}}
])
gives me:
{
"_id" : "CC123456789FR",
"field1" : null,
"field2" : "value2_2"
}
which is wrong because it only uses values from the most recent document and doesn't mix all the documents.
I tried another approach suggested by Rishi in another topic. Instead of creating a new document for each revision, he suggested pushing revision sub-documents onto an array and maintaining the latest revision at the parent document.
Something like this:
{
parcelNum: CC123456789FR,
lastUpdated: ISODate("2018-09-05T10:48:42.777Z")
field1: "value1_2",
field2: "value2_2",
revisions: [
{
datetime: ISODate("2018-09-05T10:48:38.584Z"),
field1: "value1_1",
field2: "value2_1"
},
{
datetime: ISODate("2018-09-05T10:48:40.566Z"),
field1: "value1_2",
field2: null
},
{
datetime: ISODate("2018-09-05T10:48:42.777Z"),
field1: null,
field2: "value2_2"
}
]
}
However, maintaining the latest revision is not that easy because updates are not received in a chronological order then I can receive a "new" document which has an older "datetime" field value and then I must not update the fields except if they are null. Therefore, I would have to record the last update timestamp for all fields if I want to do so!
Solution 1:[1]
You can try this:
db.getCollection('test').aggregate([
//Sort
{$sort: { datetime: -1 }},
//Add fields to an array
{$group: {
"_id": null,
"field1": { $push: "$field1" },
"field2": { $push: "$field2" },
}},
//Filter and do not include null values
{$project: {
"field1notNull" : {
$filter: {
input: "$field1",
as: "f",
cond: { $ne: [ "$$f", null ] }
}
},
"field2notNull" : {
$filter: {
input: "$field2",
as: "f",
cond: { $ne: [ "$$f", null ] }
}
}
}
},
//Get the first values of each
{$project: {
"_id": null,
"field1": {$arrayElemAt: ["$field1notNull", 0]},
"field2": {$arrayElemAt: ["$field2notNull", 0]}
}}
])
Solution 2:[2]
You can try with $facet stage, to threat field1 and field2 separatly :
db['01'].aggregate(
[
// Stage 1
{
$sort: {
"datetime":-1
}
},
// Stage 2
{
$facet: {parcelNum:[{$group:{_id:"$parcelNum"}}],
field1: [ {
$match: {
field1:{$ne:null}
}
},
{
$limit: 1
},
{
$project: {
_id:0,
field1:1
}
}, ],
field2: [ {
$match: {
field2:{$ne:null}
}
},
{
$limit: 1
},
{
$project: {
_id:0,
field2:1
}
}, ],
}
},
// Stage 3
{
$project: {
parcelNum:"$parcelNum._id" ,
field1:"$field1.field1",
field2:"$field2.field2",
}
},
// Stage 4
{
$project: {
parcelNum:{$arrayElemAt:["$parcelNum" ,0]},
field1:{$arrayElemAt:["$field1" ,0]},
field2:{$arrayElemAt:["$field2" ,0]},
}
},
],
);
Note that stages 3 and 4 are only 'decorative', needed result is present at end of stage 2. Hope it helps
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Juan Bermudez |
Solution 2 | Community |