'Random document in ElasticSearch
Is there a way to get a truly random sample from an elasticsearch index? i.e. a query that retrieves any document from the index with probability 1/N
(where N
is the number of documents currently indexed)?
And as a follow-up question: if all documents have some numeric field s
, is there a way to get a document through weighted random sampling, i.e. where the probability to get document i
with value s_i
is equal to s_i / sum(s_j for j in index)
?
Solution 1:[1]
I know it is an old question, but now it is possible to use random_score, with the following search query:
{
"size": 1,
"query": {
"function_score": {
"functions": [
{
"random_score": {
"seed": "1477072619038"
}
}
]
}
}
}
For me it is very fast with about 2 million documents.
I use current timestamp as seed, but you can use anything you like. The best is if you use the same seed, you will get the same results. So you can use your user's session id as seed and all users will have different order.
Solution 2:[2]
The only way I know of to get random documents from an index (at least in versions <= 1.3.1) is to use a script:
sort: {
_script: {
script: "Math.random() * 200000",
type: "number",
params: {},
order: "asc"
}
}
You can use that script to make some weighting based on some field of the record.
It's possible that in the future they might add something more complicated, but you'd likely have to request that from the ES team.
Solution 3:[3]
You can use random_score with a function_score
query.
{
"size":1,
"query": {
"function_score": {
"functions": [
{
"random_score": {
"seed": 11
}
}
],
"score_mode": "sum",
}
}
}
The bad part is that this will apply a random score to every document, sort the documents, and then return the first one. I don't know of anything that is smart enough to just pick a random document.
Solution 4:[4]
You can use random_score
to randomly order responses or retrieve a document with roughly 1/N
probability.
Additional notes:
https://github.com/elastic/elasticsearch/issues/1170 https://github.com/elastic/elasticsearch/issues/7783
Solution 5:[5]
NEST Way :
var result = _elastic.Search<dynamic>(s => s
.Query(q => q
.FunctionScore(fs => fs.Functions(f => f.RandomScore())
.Query(fq => fq.MatchAll()))));
raw query way :
GET index-name/_search
"size": 1,
"query": {
"function_score": {
"query" : { "match_all": {} },
"random_score": {}
}
}
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Koen. |
Solution 2 | Alcanzar |
Solution 3 | Hassaan |
Solution 4 | nycdatawrangler |
Solution 5 | Eran Peled |