Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

[RAG] Fix RAG Passage Loading #4199

Merged
merged 2 commits into from
Nov 19, 2021
Merged

[RAG] Fix RAG Passage Loading #4199

merged 2 commits into from
Nov 19, 2021

Conversation

klshuster
Copy link
Contributor

Patch description
Previously, if we errored out attempting to read a .csv file, we switched to reading passages and splitting by \t manually. However, we were still referencing an old variable from the prior loop, which references an old row. I've updated to point to the new variable.

Testing steps
I discovered this because someone pointed out that their retriever was retrieving the same document for every turn. I looked at the logs and saw that the indexing for 5k documents was taking over 5 minutes; this is way too slow, and usually means that the passage embeddings are too close to eachother. After reproducing the error locally, the passages are correctly loaded and indexing takes around 2 seconds.

Relying on CI for the rest.

@klshuster klshuster merged commit ef794ea into main Nov 19, 2021
@klshuster klshuster deleted the fix_passage_loading branch November 19, 2021 14:38
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants