'group in mongo excluding null values

I have mongo query which does the group operation on the documents.

I have almost got the expected results except that I want to refine the results without empty or null values.

Currently my query looks like this:

db.productMetadata.aggregate([{$group:{"_id":{"color":"$productAttribute.colour","gender":"$productAttribute.gender"},"count" : {$sum : 1}}}]);

And the results looks something like this:

{ "_id" : { "color" : "BLUE", "gender" : "MEN" }, "count" : 1 }
{ "_id" : {  }, "count" : 4 }
{ "_id" : { "color" : "NA", "gender" : "WOMEN" }, "count" : 1 }
{ "_id" : { "color" : "BLACK", "gender" : "MEN" }, "count" : 1 }
{ "_id" : { "color" : "BROWN", "gender" : "WOMEN" }, "count" : 1 }
{ "_id" : { "gender" : "MEN" }, "count" : 2 }
{ "_id" : { "color" : "BEIGE", "gender" : "MEN" }, "count" : 1 }
{ "_id" : { "color" : "BROWN", "gender" : "MEN" }, "count" : 1 }

I want to remove the rows if any of the group by field values are empty or null in the actual data of DB.

Excepted results should look something like this:

{ "_id" : { "color" : "BLUE", "gender" : "MEN" }, "count" : 1 }
{ "_id" : { "color" : "NA", "gender" : "WOMEN" }, "count" : 1 }
{ "_id" : { "color" : "BLACK", "gender" : "MEN" }, "count" : 1 }
{ "_id" : { "color" : "BROWN", "gender" : "WOMEN" }, "count" : 1 }
{ "_id" : { "color" : "BEIGE", "gender" : "MEN" }, "count" : 1 }
{ "_id" : { "color" : "BROWN", "gender" : "MEN" }, "count" : 1 }


Solution 1:[1]

You need an extra $match pipeline step that will filter the incoming documents based on the embedded field "$productAttribute.colour" existing and not null:

db.productMetadata.aggregate([
    { $match: {
        "productAttribute.colour": { 
            $exists: true, 
            $ne: null 
        }
    } },
    { $group: {
        _id: {
           color: "$productAttribute.colour",
           gender: "$productAttribute.gender"
        },
        count: { $sum: 1 }
    } }        
]);

Solution 2:[2]

this example includes two different Collections. For this we use aggregate function. I am also using Mongoose

  1. I am joining the cusmtomfield with customfiellabels with $lookup
  2. Flat the array with $unwind
  3. $match to exclude the name that have INACTIVE in the text (I'm using REGEX)
  4. $project to rename the Fields to show properly on the client

    . async getAllMasterDataCustomFields(req) {

        let response = {};
        try {
    
          response = await customfieldsModel.aggregate([
            {
              $lookup: {
                from: 'customfieldlabels',
                localField: 'cfId',
                foreignField: 'cfId',
                as: 'info'
              }
            },
            { '$unwind': { 'path': '$info', 'preserveNullAndEmptyArrays': true } },
            { '$match': { 'childs.name': { $not: /INACTIVE/ }}},
            {
              $project: {
                'cfId': 1,
                'label': '$info.label',
                'type': '$info.type',
                'childs': 1
              }
            }]).exec();
    
        } catch (e) {
          logger.log('error', `Error while getting response ${e.meesage}`);
        }
    
        return response;
      }
    

    .

Solution 3:[3]

Perhaps you should use $match: {'color': {$exists: true}} before $group operation. With sparse index it will work pretty fast. And do not store "null" fields in collections at all, that will reduce db size and will increase search speed for sparse indexes (less documents in index -> more speed)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Sebastian Castaldi
Solution 3 Sergey Mazur