'LSH - proof of the probabilities of success and expected collisions

Hi, I'm at a loss trying to solve this excersie. I'd really appreciate some help! Thank you.

Definition: A family of hash functions H = {h : X → U} is (r₁, r₂, p₁, p₂)-sensitive if for any q, p ∈ S:

• If p ∈ B(q, r₁) then Pr[h(q) = h(p)] ≥ p₁ (where the probability is over h chosen uniformly at random from H).
• If p is not in B(q, r2) then PH[h(q) = h(p)] ≤ p₂

To use LSH for ε-PLEB, we first fix two parameters k, L that will be chosen below. For each i = 1..` we map (in preprocessing) each point in P to a bucket g_i(p) = (h_i1(p), . . . , h_ik(p)), where h_i1...h_ik are chosen uniformly at random from H. Note that this means that the number of buckets is |U|^k, and each p is mapped to L of them. To process a query q, we search all buckets g₁(q)...g_L(q). Let p₁...p_t be the combination of all points encounters in these buckets. For each such p_j , if p_j ∈ B(q, r2) then return YES and p_j. Otherwise (if no p_j is inside B(q, r₂)) return NO.

Now set k = log_1/p2n, l = n^ρ where ρ = (log(1/p1))/(log(1/p2)). Fix a query q, and prove the following two claims:

If the dataset P contains p such that q ∈ B(p, r1), then with probability at least 1/2, p will share a bucket with q. In other words, with probability at least 1/2 there will exist j ∈ [t] such that g_j (q) = g_j (p).
Define a point p to be bad for q if q 6∈ B(p, r2). Then the expected number of bad points colliding with q at some hash function g_j is at most t.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'LSH - proof of the probabilities of success and expected collisions

Sources

Related Questions