Skip to content

Prateeksachdev93/Document-deduplication

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Document Similarity

--> Deduplication-using-minhashing

Using "minhashing" technique over shingling to find similarity of two documents. Here the type of similarity considered is the "jaccard similarity".

Locality sensitive hashing (LSH) is also implemented to optimize the number of comparisions.

Blog Link: http://deduplication.tumblr.com/

#Shingling #Minhashing #Deduplication #LocalitySensitiveHashing

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages