Locality-Sensitive Hashing I

According to Wikipedia, Locality-Sensitive hashing (LSH) is an algorithmic technique that hashes similar input items into the same “buckets” with high probability. The number of buckets is much smaller than the universe of possible input items. Since similar items end up in the same buckets, this technique can be used for data clustering and nearest neighbor search. It differs from conventional hashing techniques in that hash collisions are maximized, not minimized. Alternatively, the technique can be seen as a way to reduce the dimensionality of high-dimensional data; high-dimensional input items can be reduced to low-dimensional versions while preserving relative distances between items. In this problem, we will use a portion of LSH to determine similarities between strings.


This is a companion discussion topic for the original entry at https://toph.co/p/locality-sensitive-hashing-i