'ElasticSearch aggregation query that concatenate array of strings into one bucket if attribute has value

The aim is to create an aggregation query in ElasticSearch that concatenate array of strings into one bucket if the attribute has a certain value.

An example JSON stored in the ES engine:

{
  "header": {
    "identifier": "oai:gup.ub.gu.se/264598",
    "datestamp": "2019-05-11 11:03:06",
    "setSpec": "GU_SWEPUB"
  },
  "metadata": {
    "mods": {
      "@attributes": {
        "version": "3.5"
      },
      "recordInfo": {
        "recordContentSource": "gu"
      },
      "note": [
        "not verified at registration",
        "Published",
        "3"
      ],
      "identifier": [
        "https://gup.ub.gu.se/publication/264598",
        "29405517",
        "10.1111/adb.12603"
      ],
      "titleInfo": {
        "title": "Activation of amylin receptors attenuates alcohol-mediated behaviours in rodents."
      },
      "abstract": "Alcohol expresses its reinforcing properties by activating areas of the mesolimbic dopamine system, which consists of dopaminergic neurons projecting from the ventral tegmental area to the nucleus accumbens. The findings that reward induced by food and addictive drugs involve common mechanisms raise the possibility that gut-brain hormones, which control appetite, such as amylin, could be involved in reward regulation. Amylin decreases food intake, and despite its implication in the regulation of natural rewards, tenuous evidence support amylinergic mediation of artificial rewards, such as alcohol. Therefore, the present experiments were designed to investigate the effect of salmon calcitonin (sCT), an amylin receptor agonist and analogue of endogenous amylin, on various alcohol-related behaviours in rodents. We showed that acute sCT administration attenuated the established effects of alcohol on the mesolimbic dopamine system, particularly alcohol-induced locomotor stimulation and accumbal dopamine release. Using the conditioned place preference model, we demonstrated that repeated sCT administration prevented the expression of alcohol's rewarding properties and that acute sCT administration blocked the reward-dependent memory consolidation. In addition, sCT pre-treatment attenuated alcohol intake in low alcohol-consuming rats, with a more evident decrease in high alcohol consumers in the intermittent alcohol access model. Lastly, sCT did not alter peanut butter intake, blood alcohol concentration and plasma corticosterone levels in mice. Taken together, the present data support that amylin signalling is involved in the expression of alcohol reinforcement and that amylin receptor agonists could be considered for the treatment of alcohol use disorder in humans.",
      "subject": [
        {
          "@attributes": {
            "lang": "swe",
            "authority": "uka.se"
          },
          "topic": "Neurovetenskaper"
        },
        {
          "@attributes": {
            "lang": "eng",
            "authority": "uka.se"
          },
          "topic": "Neurosciences"
        }
      ],
      "language": {
        "languageTerm": "eng"
      },
      "genre": [
        "publication/journal-article",
        "art",
        "ref"
      ],
      "name": [
        {
          "@attributes": {
            "type": "personal",
            "authority": "gu.se"
          },
          "namePart": [
            "Aimilia Lydia",
            "Kalafateli",
            "1987"
          ],
          "role": {
            "roleTerm": "aut"
          },
          "affiliation": [
            "Göteborgs universitet",
            "Institutionen för neurovetenskap och fysiologi, sektionen för farmakologi",
            "Gothenburg University",
            "Institute of Neuroscience and Physiology, Department of Pharmacology"
          ]
        },
        {
          "@attributes": {
            "type": "personal",
            "authority": "gu.se"
          },
          "namePart": [
            "Daniel",
            "Vallöf",
            "1988"
          ],
          "role": {
            "roleTerm": "aut"
          },
          "affiliation": [
            "Göteborgs universitet",
            "Institutionen för neurovetenskap och fysiologi, sektionen för farmakologi",
            "Gothenburg University",
            "Institute of Neuroscience and Physiology, Department of Pharmacology"
          ]
        },
        {
          "@attributes": {
            "type": "personal",
            "authority": "gu.se"
          },
          "namePart": [
            "Elisabeth",
            "Jerlhag",
            "1978"
          ],
          "role": {
            "roleTerm": "aut"
          },
          "affiliation": [
            "Göteborgs universitet",
            "Institutionen för neurovetenskap och fysiologi, sektionen för farmakologi",
            "Gothenburg University",
            "Institute of Neuroscience and Physiology, Department of Pharmacology"
          ]
        },
        {
          "@attributes": {
            "type": "corporate",
            "lang": "swe",
            "authority": "gu.se"
          },
          "namePart": [
            "Göteborgs universitet",
            "Sahlgrenska akademin",
            "Institutionen för neurovetenskap och fysiologi, sektionen för farmakologi"
          ]
        },
        {
          "@attributes": {
            "type": "corporate",
            "lang": "eng",
            "authority": "gu.se"
          },
          "namePart": [
            "Gothenburg University",
            "Sahlgrenska Academy",
            "Institute of Neuroscience and Physiology, Department of Pharmacology"
          ]
        }
      ],
      "originInfo": {
        "dateIssued": "2019"
      },
      "relatedItem": {
        "@attributes": {
          "type": "host"
        },
        "titleInfo": {
          "title": "Addiction biology"
        },
        "identifier": "1369-1600",
        "part": {
          "detail": [
            {
              "@attributes": {
                "type": "volume"
              },
              "number": "24"
            },
            {
              "@attributes": {
                "type": "issue"
              },
              "number": "3"
            }
          ],
          "extent": {
            "start": "388",
            "end": "402"
          }
        }
      },
      "typeOfResource": "text"
    }
  }
}

The query created so far:

{
  "sort": [
    {
      "metadata.mods.originInfo.dateIssued.keyword": {
        "order": "desc"
      }
    }
  ],
  "query": {
    "multi_match": {
      "query": "Sahlgrenska Academy",
      "type": "best_fields",
      "fields": [
        "metadata.mods.name.namePart"
      ],
      "operator": "and"
    }
  },
  "aggs": {
    "yearSpan": {
      "terms": {
        "field": "metadata.mods.originInfo.dateIssued.keyword",
        "size": 2500
      }
    },
    "authorcloud": {
      "terms": {
        "field": "metadata.mods.name.namePart.keyword",
        "size": 150
      }
    },
    "cloud": {
      "terms": {
        "field": "metadata.mods.subject.topic.keyword",
        "size": 150
      }
    }
  },
  "stored_fields": []
}

I would like to change the query above so that authorcloud aggregation only include fields where [email protected]="personal" and that the different strings in the array name.namePart is concatenated into the same bucket. IE This array build up a single bucket in the authorcloud:

      "namePart": [
        "Aimilia Lydia",
        "Kalafateli",
        "1987"
      ],

yields:

bucket = "Aimilia Lydia Kalafateli 1987"


Solution 1:[1]

It's been over two years, but I found myself with the same question.

I was able to accomplish it using "adjacency_matrix" aggregation

Example aggregation

GET kibana_sample_data_ecommerce/_search
{
  "size": 0,
  "aggs": {
    "interactions": {
      "adjacency_matrix": {
        "filters": {
          "grpA": {
            "match": {
              "manufacturer.keyword": "Low Tide Media"
            }
          },
          "grpB": {
            "match": {
              "manufacturer.keyword": "Elitelligence"
            }
          },
          "grpC": {
            "match": {
              "manufacturer.keyword": "Oceanavigations"
            }
          }
        }
      }
    }
  }
}

Example response

 {
   ...
   "aggregations" : {
     "interactions" : {
       "buckets" : [
         {
           "key" : "grpA",
           "doc_count" : 1553
         },
         {
           "key" : "grpA&grpB",
           "doc_count" : 590
         },
         {
           "key" : "grpA&grpC",
           "doc_count" : 329
         },
         {
           "key" : "grpB",
           "doc_count" : 1370
         },
         {
           "key" : "grpB&grpC",
           "doc_count" : 299
         },
         {
           "key" : "grpC",
           "doc_count" : 1218
         }
       ]
     }
   }
 }

Note that it groups the values if two or more are identified in the same array/field.

This solution was found in this article along with many other useful information regarding Elasticsearch aggregations

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 fnkoc