'Bulk insert only when specified condition is met

I have an array of objects that look like this:

{ type: 'foo', value : 2, hashedIdentifier : ASDZXC, createdAt : '2022-02-27T14:17:44.860+00:00' }

I would like to insert them in bulk into a MongoDB collection. My business logic demands that no 2 identical objects are created within a 7 day period. For that I use the hashedIdentifier field to find duplicates, where createdAt is $gte "7 days ago", filter duplicates out of the array, and then insert what remains in the array.

These are 2 different DB queries, and I'm looking for something a little more hermetic and atomic.

I was looking at updateMany with upsert, but couldn't figure this out.

Is there a way to insert all the array objects in bulk, only inserting those objects who do not conflict with the above constraint (identifier + createdAt range)?

  • A unique (or compound unique) index is no use for me here, as I do want to enable the creation of duplicates after the time range (7 days) has passed.
  • a TTL on collection objects is also out of the question, as I want to keep them all, forever.


Solution 1:[1]

One way to do it is using an aggregation pipeline with $merge . For example:

db.collection.aggregate([
  {
    $group: {
      _id: "$hashedIdentifier",
      latestExisting: {
        $max: "$createdAt"
      }
    }
  },
  {
    $match: {
      _id: {
        $in: [
          "ASDZXC",
          "TSSGKE",
          "SDFKAR"
        ]
      }
    }
  },
  {
    $group: {
      _id: 0,
      existing: {
        $push: {
          hashedIdentifier: "$_id",
          latestExisting: "$latestExisting"
        }
      }
    }
  },
  {
    $addFields: {
      newItems: [
        {
          type: "foo",
          value: 2,
          hashedIdentifier: "ASDZXC",
          createdAt: ISODate("2022-03-06T14:18:44.860+00:00")
        },
        {
          type: "bar",
          value: 3,
          hashedIdentifier: "TSSGKE",
          createdAt: ISODate("2022-03-06T15:17:44.860+00:00")
        },
        {
          type: "newOne",
          value: 9,
          hashedIdentifier: "SDFKAR",
          createdAt: ISODate("2022-03-06T15:17:44.860+00:00")
        }
      ]
    }
  },
  {
    $unwind: "$newItems"
  },
  {
    $project: {
      existing: {
        $filter: {
          input: "$existing",
          as: "item",
          cond: {
            $eq: [
              "$$item.hashedIdentifier",
              "$newItems.hashedIdentifier"
            ]
          }
        }
      },
      newItems: 1,
      latestExisting: 1
    }
  },
  {
    $project: {
      existing: {$arrayElemAt: ["$existing", 0]},
      newItem: "$newItems"
    }
  },
  {
    $project: {
      delta: {
        $subtract: [
          "$newItem.createdAt",
          "$existing.latestExisting"
        ]
      },
      latestExisting: 1,
      newItem: 1
    }
  },
  {
    $match: {
      $or: [{ delta: {$gte: 604800000}}, {delta: null}]
    }
  },
  {
    $replaceRoot: {newRoot: "$newItem"}
  },
  {
    $merge: {into: "collection"}
  }
])

As you can see on this playground example.

First we group the existing docs by hashedIdentifier, as we only interested at the latest existing date. Then we keep the relevant documents that are matching to the ones we test for inserting and group all of them to one document. Next step is to add all the new items using $addFields, and to $unwind it, so they will be separate documents. Now we just need to match for each new item, the existing item if there is any, and test the time difference condition (604800000 ms). The documents that are matching to this condition can be $merged back to the collection as new validated documents.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 nimrod serok