'Expected pair of collisions in two choice hashing

In two choice hashing(with chaining), two random hash functions h1, h2 are selected to hash n keys to m positions. The process goes like this:

Insert all n keys sequentially, by evaluating for each key x, the two hash functions, and adding the key to the shorter list of those indexed by h1(x), h2(x). Let Y be the number of pairs (x, x') such that they end up in the same linked list. What would the expectation of Y(E[Y]) be like?

Assume h1, h2 hash keys uniformly and independently



Solution 1:[1]

I doubt that there is a clean analytic solution to this problem.

But it is very tractable to produce numerical approximations for the case of a large number of buckets. Let x be the ratio between the number of keys stored and the number of buckets. Let fn(x) be the expected portion of buckets with n keys and Fn(x) be the expected portion of buckets with at most n keys.

Then trivially Fn(x) = f0(x) + f1(x) + ... + fn(x). And also fn'(x) is the probability that with a key add a bucket of size n-1 will be added to, minus the probability that with a key add, a bucket of size n will be to.

The probability that a bucket of size n will be added to next is the probability that the first hash function chooses a bucket of size n and the second chooses one of size at least n, plus the probability that the first hash function chooses a bucket of size greater than n and the second chooses one of size n. The probability of choosing a bucket of size greater than n is simply 1 - Fn(x). So this probability is fn(x)(1 - Fn-1(x)) + (1 - Fn(x))fn(x).

That makes fn'(x) = fn-1(x)(1 - Fn-2(x)) + (1 - Fn-1(x))fn-1(x) - fn(x)(1 - Fn-1(x)) - (1 - Fn(x))fn(x). Which is a horribly non-linear system of equations and there is no need to expect any nice solution to it. But it is also amenable to http://en.wikipedia.org/wiki/Runge%E2%80%93Kutta_methods and you shouldn't need too many terms to get accurate estimates since the portion heavily filled buckets falls off faster than exponentially. (To see why, convince yourself that if n is significantly larger than x, then fn+1'(x) is approximately fn2(x). The repeated squaring of small terms means that there is a very sharp falloff from fn(x) < 0.01 to fn(x) < 0.00000001.)

Solution 2:[2]

It depends on m. For m being O(n^(3/2)), E[Y] = O(1).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 btilly
Solution 2 morteza hosseini