'LSH - proof of the probabilities of success and expected collisions

Hi, I'm at a loss trying to solve this excersie. I'd really appreciate some help! Thank you.

Definition: A family of hash functions H = {h : X → U} is (r1, r2, p1, p2)-sensitive if for any q, p ∈ S:

• If p ∈ B(q, r1) then Pr[h(q) = h(p)] ≥ p1 (where the probability is over h chosen uniformly at random from H).
• If p is not in B(q, r2) then PH[h(q) = h(p)] ≤ p2

To use LSH for ε-PLEB, we first fix two parameters k, L that will be chosen below. For each i = 1..` we map (in preprocessing) each point in P to a bucket gi(p) = (hi1(p), . . . , hik(p)), where hi1...hik are chosen uniformly at random from H. Note that this means that the number of buckets is |U|k, and each p is mapped to L of them. To process a query q, we search all buckets g1(q)...gL(q). Let p1...pt be the combination of all points encounters in these buckets. For each such pj , if pj ∈ B(q, r2) then return YES and pj. Otherwise (if no pj is inside B(q, r2)) return NO.

Now set k = log1/p2n, l = nρ where ρ = (log(1/p1))/(log(1/p2)). Fix a query q, and prove the following two claims:

  1. If the dataset P contains p such that q ∈ B(p, r1), then with probability at least 1/2, p will share a bucket with q. In other words, with probability at least 1/2 there will exist j ∈ [t] such that gj (q) = gj (p).
  2. Define a point p to be bad for q if q 6∈ B(p, r2). Then the expected number of bad points colliding with q at some hash function gj is at most t.


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source