'Expected pair of collisions in two choice hashing
In two choice hashing(with chaining), two random hash functions h1, h2 are selected to hash n keys to m positions. The process goes like this:
Insert all n keys sequentially, by evaluating for each key x, the two hash functions, and adding the key to the shorter list of those indexed by h1(x), h2(x). Let Y be the number of pairs (x, x') such that they end up in the same linked list. What would the expectation of Y(E[Y]) be like?
Assume h1, h2 hash keys uniformly and independently
Solution 1:[1]
I doubt that there is a clean analytic solution to this problem.
But it is very tractable to produce numerical approximations for the case of a large number of buckets. Let x
be the ratio between the number of keys stored and the number of buckets. Let fn(x)
be the expected portion of buckets with n
keys and Fn(x)
be the expected portion of buckets with at most n
keys.
Then trivially Fn(x) = f0(x) + f1(x) + ... + fn(x)
. And also fn'(x)
is the probability that with a key add a bucket of size n-1
will be added to, minus the probability that with a key add, a bucket of size n
will be to.
The probability that a bucket of size n
will be added to next is the probability that the first hash function chooses a bucket of size n
and the second chooses one of size at least n
, plus the probability that the first hash function chooses a bucket of size greater than n
and the second chooses one of size n
. The probability of choosing a bucket of size greater than n
is simply 1 - Fn(x)
. So this probability is fn(x)(1 - Fn-1(x)) + (1 - Fn(x))fn(x)
.
That makes fn'(x) = fn-1(x)(1 - Fn-2(x)) + (1 - Fn-1(x))fn-1(x) - fn(x)(1 - Fn-1(x)) - (1 - Fn(x))fn(x)
. Which is a horribly non-linear system of equations and there is no need to expect any nice solution to it. But it is also amenable to http://en.wikipedia.org/wiki/Runge%E2%80%93Kutta_methods and you shouldn't need too many terms to get accurate estimates since the portion heavily filled buckets falls off faster than exponentially. (To see why, convince yourself that if n
is significantly larger than x
, then fn+1'(x)
is approximately fn2(x)
. The repeated squaring of small terms means that there is a very sharp falloff from fn(x) < 0.01
to fn(x) < 0.00000001
.)
Solution 2:[2]
It depends on m. For m being O(n^(3/2)), E[Y] = O(1).
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | btilly |
Solution 2 | morteza hosseini |