Similarities between documents and query may be >1 #3

hrs · 2016-03-21T19:56:41Z

The README claims that similarities between documents and queries shouldn't be greater than 1. However:

table = tfidf.tfidf()
table.addDocument("foo", ["alpha", "bravo", "charlie", "delta", "echo", "foxtrot", "golf", "hotel"])
table.addDocument("bar", ["alpha", "bravo", "charlie", "india", "juliet", "kilo"])
table.addDocument("baz", ["kilo", "lima", "mike", "november"])
print table.similarities (["alpha", "bravo", "charlie", "india"])

Yields [['foo', 0.5625], ['bar', 1.0416666666666665], ['baz', 0.0]]. Whoops!

This is happening because the query isn't being normalized. The ranking of results should still be correct, but it'd be better if we normalized it so we can make guarantees about the output.

The text was updated successfully, but these errors were encountered:

tianye2856 · 2018-02-26T08:50:16Z

I meet the same problem, please solve it, thanks.

shanalikhan · 2018-04-08T18:52:15Z

what is the solution you guys did it to solve it

hrs self-assigned this Mar 21, 2016

thepurpleowl mentioned this issue Jul 11, 2019

Fix for issue #3 and issue #6 #7

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Similarities between documents and query may be >1 #3

Similarities between documents and query may be >1 #3

hrs commented Mar 21, 2016

tianye2856 commented Feb 26, 2018

shanalikhan commented Apr 8, 2018

Similarities between documents and query may be >1 #3

Similarities between documents and query may be >1 #3

Comments

hrs commented Mar 21, 2016

tianye2856 commented Feb 26, 2018

shanalikhan commented Apr 8, 2018