'Random document in ElasticSearch

Is there a way to get a truly random sample from an elasticsearch index? i.e. a query that retrieves any document from the index with probability 1/N (where N is the number of documents currently indexed)?

And as a follow-up question: if all documents have some numeric field s, is there a way to get a document through weighted random sampling, i.e. where the probability to get document i with value s_i is equal to s_i / sum(s_j for j in index)?



Solution 1:[1]

I know it is an old question, but now it is possible to use random_score, with the following search query:

{
   "size": 1,
   "query": {
      "function_score": {
         "functions": [
            {
               "random_score": {
                  "seed": "1477072619038"
               }
            }
         ]
      }
   }
}

For me it is very fast with about 2 million documents.

I use current timestamp as seed, but you can use anything you like. The best is if you use the same seed, you will get the same results. So you can use your user's session id as seed and all users will have different order.

Solution 2:[2]

The only way I know of to get random documents from an index (at least in versions <= 1.3.1) is to use a script:

sort: {
  _script: {
    script: "Math.random() * 200000",
    type: "number",
    params: {},
    order: "asc"
 }
}

You can use that script to make some weighting based on some field of the record.

It's possible that in the future they might add something more complicated, but you'd likely have to request that from the ES team.

Solution 3:[3]

You can use random_score with a function_score query.

{
    "size":1,
    "query": {
        "function_score": {
            "functions": [
                {
                    "random_score":  {
                        "seed": 11
                    }
                }
            ],
            "score_mode": "sum",
        }
    }
}

The bad part is that this will apply a random score to every document, sort the documents, and then return the first one. I don't know of anything that is smart enough to just pick a random document.

Solution 4:[4]

You can use random_score to randomly order responses or retrieve a document with roughly 1/N probability.

Additional notes:

https://github.com/elastic/elasticsearch/issues/1170 https://github.com/elastic/elasticsearch/issues/7783

Solution 5:[5]

NEST Way :

var result = _elastic.Search<dynamic>(s => s
        .Query(q => q
        .FunctionScore(fs => fs.Functions(f => f.RandomScore())
        .Query(fq => fq.MatchAll()))));

raw query way :

 GET index-name/_search
    "size": 1,
    "query": {
        "function_score": {
                "query" : { "match_all": {} },
               "random_score": {}
        }
    }
}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Koen.
Solution 2 Alcanzar
Solution 3 Hassaan
Solution 4 nycdatawrangler
Solution 5 Eran Peled