Using "minhashing" technique over shingling to find similarity of two documents. Here the type of similarity considered is the "jaccard similarity".
Locality sensitive hashing (LSH) is also implemented to optimize the number of comparisions.
Blog Link: http://deduplication.tumblr.com/
#Shingling #Minhashing #Deduplication #LocalitySensitiveHashing