'Fuzzy Bucket Aggregation in Elasticsearch
Elasticsearch supports fuzzy search queries: https://www.elastic.co/guide/en/elasticsearch/guide/2.x/fuzzy-match-query.html
And Bucket Aggregation by Term: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html
It say there "...buckets are dynamically built - one per unique value."
Is it possible to combine the two features and bucket by fuzzy terms? so that for example "America" and "Amrica" will fall under the same bucket? (using "term" bucket they fall under 2 separate buckets, using "fuzzy" search, both records are returned.
I'm trying to do group-by "keywords" including typos - maybe there's a different way to go about it? (brute force is to run "fuzzy" search for each "keyword" and manually add the numbers...)
Solution 1:[1]
it's a best partice to index your data well so you can do anything you want latter on. This problem could be handled while indexing the documents using normalizers or synonyms.
information aboout normalizers:
- https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-overview.html#normalization
- https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-normalizers.html
- https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-normalization-tokenfilter.html#analysis-normalization-tokenfilter
information about synonyms:
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | daniel zimlichman |