'Expected pair of collisions in two choice hashing
In two choice hashing(with chaining), two random hash functions h1, h2 are selected to hash n keys to m positions. The process goes like this:
Insert all n keys sequentially, by evaluating for each key x, the two hash functions, and adding the key to the shorter list of those indexed by h1(x), h2(x). Let Y be the number of pairs (x, x') such that they end up in the same linked list. What would the expectation of Y(E[Y]) be like?
Assume h1, h2 hash keys uniformly and independently
Solution 1:[1]
I doubt that there is a clean analytic solution to this problem.
But it is very tractable to produce numerical approximations for the case of a large number of buckets. Let x be the ratio between the number of keys stored and the number of buckets. Let fn(x) be the expected portion of buckets with n keys and Fn(x) be the expected portion of buckets with at most n keys.
Then trivially Fn(x) = f0(x) + f1(x) + ... + fn(x). And also fn'(x) is the probability that with a key add a bucket of size n-1 will be added to, minus the probability that with a key add, a bucket of size n will be to.
The probability that a bucket of size n will be added to next is the probability that the first hash function chooses a bucket of size n and the second chooses one of size at least n, plus the probability that the first hash function chooses a bucket of size greater than n and the second chooses one of size n. The probability of choosing a bucket of size greater than n is simply 1 - Fn(x). So this probability is fn(x)(1 - Fn-1(x)) + (1 - Fn(x))fn(x).
That makes fn'(x) = fn-1(x)(1 - Fn-2(x)) + (1 - Fn-1(x))fn-1(x) - fn(x)(1 - Fn-1(x)) - (1 - Fn(x))fn(x). Which is a horribly non-linear system of equations and there is no need to expect any nice solution to it. But it is also amenable to http://en.wikipedia.org/wiki/Runge%E2%80%93Kutta_methods and you shouldn't need too many terms to get accurate estimates since the portion heavily filled buckets falls off faster than exponentially. (To see why, convince yourself that if n is significantly larger than x, then fn+1'(x) is approximately fn2(x). The repeated squaring of small terms means that there is a very sharp falloff from fn(x) < 0.01 to fn(x) < 0.00000001.)
Solution 2:[2]
It depends on m. For m being O(n^(3/2)), E[Y] = O(1).
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | btilly |
| Solution 2 | morteza hosseini |
