-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pinecone Hybrid Search #659
base: main
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
I had an oversight with using the |
@nfcampos @hwchase17 I think this is ready to be reviewed whenever the chance. I am using |
Note to me: use tiktoken instead of bert |
const embeddings = new OpenAIEmbeddings(); | ||
const pineconeIndex = client.Index(process.env.PINECONE_INDEX!); | ||
|
||
const tokenizer = new BertTokenizer(undefined, true, 512); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using js-tiktoken
instead, which should already be a required dependency.
this.topK = args.topK; | ||
this.alpha = args.alpha; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the default values are being overriden, should that be the case? If not, consider making alpha: number
and topK: number
this.topK = args.topK; | |
this.alpha = args.alpha; | |
this.topK = args.topK ?? 4; | |
this.alpha = args.alpha ?? 0.5; |
@@ -1,5 +1,9 @@ | |||
export { RemoteRetriever } from "./remote/base.js"; | |||
export { ChatGPTPluginRetriever } from "./remote/chatgpt-plugin.js"; | |||
export { | |||
PineconeHybridSearchRetriever, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding a file into a barrel file, create a new entrypoint instead.
|
||
pineconeIndex: VectorOperationsApi; | ||
|
||
tokenizer: BertTokenizer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would make more sense for the sparse encoder to be more configurable, not assuming BERT tokenization.
alpha: number | ||
): [number[], SparseValues] { | ||
if (alpha < 0 || alpha > 1) { | ||
throw new Error("Alpha must be between 0 and 1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: consider validating this.alpha
in the constructor.
nit: do we need to pass this.alpha
as a parameter? Why not access it directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: consider validating this.alpha
in the constructor.
nit: do we need to pass this.alpha
as a parameter? Why not access it directly?
} from "@pinecone-database/pinecone/dist/pinecone-generated-ts-fetch"; | ||
import { BertTokenizer } from "bert-tokenizer"; | ||
|
||
import { PineconeLibArgs, PineconeMetadata } from "vectorstores/pinecone.js"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid absolute imports, as this causes the example
not to build
I tried to emulate the langchain python implementation of Pinecone Hybrid Search. I also used the Supabase Hybrid Search implementation as a reference. Thanks @hwchase17 @ericciarla
I went with
BertWordPieceTokenizer
fromtokenizers
as the tokenizer. I also had to update the@pinecone-database/pinecone
to^0.0.12
as I could not addsparseVectors
otherwise.Not 100% sure if everything is in working order yet, but I will work on testing it in my local project when I get the chance. A good starting point nonetheless!