'Sort multi-bucket aggregation by source fields inside inner multi-bucket aggregation
TL;DR: Using an inner multi-bucket aggregation (top_hits with size: 1) inside an outer multi-bucket aggregation, is it possible to sort the buckets of the outer aggregation by the data in the inner buckets?
I have the following index mappings
{
"parent": {
"properties": {
"children": {
"type": "nested",
"properties": {
"child_id": { "type": "keyword" }
}
}
}
}
}
and each child (in data) has also the properties last_modified: Date and other_property: String.
I need to fetch a list of children (of all the parents but without the parents), but only the one with the latest last_modified per each child_id. Then I need to sort and paginate those results to return manageable amounts of data.
I'm able to get the data and paginate over it with a combination of nested, terms, top_hits, and bucket_sort aggregations (and also get the total count with cardinality)
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"children": {
"nested": {
"path": "children"
},
"aggs": {
"totalCount": {
"cardinality": {
"field": "children.child_id"
}
},
"oneChildPerId": {
"terms": {
"field": "children.child_id",
"order": { "_term": "asc" },
"size": 1000000
},
"aggs": {
"lastModified": {
"top_hits": {
"_source": [
"children.other_property"
],
"sort": {
"children.last_modified": {
"order": "desc"
}
},
"size": 1
}
},
"paginate": {
"bucket_sort": {
"from": 36,
"size": 3
}
}
}
}
}
}
}
}
but after more than a solid day of going through the docs and experimenting, I seem to be no closer to figuring out, how to sort the buckets of my oneChildPerId aggregation by the other_property of that single child retrieved by lastModified aggregation.
Is there a way to sort a multi-bucket aggregation by results in a nested multi-bucket aggregation?
What I've tried:
- I thought I could use
bucket_sortfor that too, but apparently itssortcan only be used with paths containing other single-bucket aggregations and ending in a metic one. - I've tried to find a way to somehow transform the 1-result multi-bucket of
lastModifiedinto a single-bucket, but haven't found any.
I'm using ElasticSearch 6.8.6 (the bucket_sort and similar tools weren't available in ES 5.x and older).
Solution 1:[1]
I had the same problem. I needed a terms aggregation with a nested top_hits, and want to sort by a specific field inside the nested aggregation.
Not sure how performant my solution is, but the desired behaviour can be achieved with a single-value metric aggregation on the same level as the top_hits. Then you can sort by this new aggregation in the terms aggregation with the order field.
Here an example:
POST books/_doc
{ "genre": "action", "title": "bookA", "pages": 200 }
POST books/_doc
{ "genre": "action", "title": "bookB", "pages": 35 }
POST books/_doc
{ "genre": "action", "title": "bookC", "pages": 170 }
POST books/_doc
{ "genre": "comedy", "title": "bookD", "pages": 80 }
POST books/_doc
{ "genre": "comedy", "title": "bookE", "pages": 90 }
GET books/_search
{
"size": 0,
"aggs": {
"by_genre": {
"terms": {
"field": "genre.keyword",
"order": {"max_pages": "asc"}
},
"aggs": {
"top_book": {
"top_hits": {
"size": 1,
"sort": [{"pages": {"order": "desc"}}]
}
},
"max_pages": {"max": {"field": "pages"}}
}
}
}
}
by_genre has the order field which sorts by a sub aggregation called max_pages. max_pages has only been added for this purpose. It creates a single-value metric by which the order is able to sort by.
Query above returns (I've shortened the output for clarity):
{ "genre" : "comedy", "title" : "bookE", "pages" : 90 }
{ "genre" : "action", "title" : "bookA", "pages" : 200 }
If you change "order": {"max_pages": "asc"} to "order": {"max_pages": "desc"}, the output becomes:
{ "genre" : "action", "title" : "bookA", "pages" : 200 }
{ "genre" : "comedy", "title" : "bookE", "pages" : 90 }
The type of the max_pages aggregation can be changed as needed , as long as it is a single-value metic aggregation (e.g. sum, avg, etc)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
