MinHash for Jaccard
H(S) = min over hash functions of hash values in set S. P[H(S1) = H(S2)] = |S1 ∩ S2| / |S1 ∪ S2|.
Advertisement
SimHash for cosine
Random hyperplane. h(x) = sign(w·x). Angle between vectors ↔ P[different hash].
Advertisement
Multi-probe
Query multiple nearby buckets. Combines low false-negative rate with reasonable index size.