'Delete result of mongodb $lookup aggregation

How do I delete all the chunks documents which are returned as a result of this aggregation?

db.getCollection('chunks').aggregate([
    {
      $lookup:
        {
          from: "files",
          localField: "files_id",
          foreignField: "_id",
          as: "file"
        }
   },
   {
     $match:
       {
         "file.uploadDate":
           {
             $lt: ISODate("2017-06-10T00:00:00.000Z")
           }
       }
   }
])

My schema has a collection named files, which contains file metadata (name, uploadDate) and chunks, which contain the actual data (binary, files_id)

I am aware of db.collection.deleteMany({}) however it accepts only a match filter.

I have MongoDB 3.2



Solution 1:[1]

Loop the results:

var ops = [];

db.getCollection('chunks').aggregate([
    {
      $lookup:
        {
          from: "files",
          localField: "files_id",
          foreignField: "_id",
          as: "file"
        }
   },
   {
     $match:
       {
         "file.uploadDate":
           {
             $lt: ISODate("2017-06-10T00:00:00.000Z")
           }
       }
   }
]).forEach(doc => {
  ops = [
    ...ops,
    { "deleteOne": {
       "filter": { "_id": doc._id }   
    }}
  ];
  if ( ops.length >= 1000 ) {
    db.getCollection('chunks').bulkWrite(ops);
    ops = [];
  }
});

if ( ops.length > 0 ) {
  db.getCollection('chunks').bulkWrite(ops);
  ops = [];
}

Or in environments without ES6:

var ops = [];

db.getCollection('chunks').aggregate([
    {
      $lookup:
        {
          from: "files",
          localField: "files_id",
          foreignField: "_id",
          as: "file"
        }
   },
   {
     $match:
       {
         "file.uploadDate":
           {
             $lt: ISODate("2017-06-10T00:00:00.000Z")
           }
       }
   }
]).forEach(function(doc) {

  ops.push({ "deleteOne": { "filter": { "_id": doc._id }  } });

  if ( ops.length >= 1000 ) {
    db.getCollection('chunks').bulkWrite(ops);
    ops = [];
  }
});

if ( ops.length > 0 ) {
  db.getCollection('chunks').bulkWrite(ops);
  ops = [];
}

Using .bulkWrite() then you are basically "batching" the requests in lots of 1000. So the actual writes and responses from the database happen only at tha time and not for all entries.

You cannot supply an aggregation pipeline as the query argument to the general .remove**() methods. So what you do instead is loop the cursor with an action like this.

Solution 2:[2]

After you getting aggregate result you can use map function to get all chunk ids and then you can use db.collection.remove() with $in operator.

var pipeline = [
  {$lookup:{
      from: "files",
      localField: "files_id",
      foreignField: "_id",
      as: "file"
    }
  },
  {$match:{
      "file.uploadDate":
      {
        $lt: ISODate("2017-06-10T00:00:00.000Z")
      }
    }
  }
];

var cursor = db.chunks.aggregate(pipeline);
var chunkIds = cursor.map(function (chunk) { return chunk._id; });
db.chunks.remove({"_id": { "$in": chunkIds }});

Solution 3:[3]

An alternative approach, which is a little hacky, is moving the computation to mongodb, instead of running the list via your Node.js application. This would require some temproary changes in the data in the database however.

The simple logic is this:

  • Use an aggregation to mark items in the database for deletion
  • Send a delete query that removes all the marked items

Note: I am writing the code for the most recent version of mongodb, and the $merge step I rely on is available from at least version 4.2, but not 4.0 or earler, according to the documentation. This answer does not help for version 3.2 which was used in the original question, but I think it's worth adding it here as this question + answer shows up for searches on the problem in general.

Mark items for deletion

// .toArray() at the end triggers mongodb to execute the merge step - even though it doesn't return anything.
// await makes sure we wait until all items are marked before continuing.
await db.collection("chunks").aggregate([
  // The lookup and matching as before
  { $lookup: { from: "files", localField: "files_id", foreignField: "_id", as: "files" } },
  { $match: { "files.uploadDate": { $lt: ISODate("2017-06-10T00:00:00.000Z") } } },

  // Remove the files field added by the lookup, no longer needed
  { $removeField: "files" },

  // Mark for deletion by setting a field
  { $set: { __markedForDeletion: true } },

  // Overwrite chunks entries with the deletion marking
  { $merge: { into: "chunks", whenMatched: "replace" } },
]).toArray();

Delete marked items

await db.collection("chunks").deleteMany({ __markedForDeletion: true });

The advantage of this approach is that it does not require mongodb serializing and sending data to the Node.js application, and just waits for mongodb to internally execute commands, which is much faster and less error prone than the other answers mentioned here.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Shaishab Roy
Solution 3 Niels Abildgaard