community: add milvus hybrid search retriever #20375

BuxianChen · 2024-04-12T08:15:24Z

Description:
I add milvus hybrid search retriever. The PR messages remains to complete, and I need some advices for the implementation. The code that I already committed can run successfully, but need to be improved, refactor, lint (I have some problem in format, lint, etc).

Here are my plans, and need some helps:

May refactor the code, need some advises. Meanwhile, I need do more tests, I believe there are many bugs in my current implementation.
Pass the lint, format, mypy checks, add docstrings and type hints.
Add more functional: support hybrid search, single vector search and query (i.e. no vector sematic search) by MilvusHybridSearchRetriever.

Dependencies:

Milvus>=2.4.0
pymilvus>=2.4.0

Docs

an example notebook showing its use. It lives in docs/docs/integrations/retrievers/milvus_hybrid_search.ipynb.

Thank you for contributing to LangChain!

PR title: "package: description"
- Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes.
- Example: "community: add foobar LLM"
PR message: Delete this entire checklist and replace with
- Description: a description of the change
- Issue: the issue # it fixes, if applicable
- Dependencies: any dependencies required for this change
- Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out!
Add tests and docs: If you're adding a new integration, please include
1. a test for the integration, preferably unit tests that do not rely on network access,
2. an example notebook showing its use. It lives in docs/docs/integrations directory.
Lint and test: Run make format, make lint and make test from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:

Make sure optional dependencies are imported within a function.
Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests.
Most PRs should not touch more than one package.
Changes should be backwards compatible.
If you are adding something to community, do not re-import it in langchain.

If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17.

vercel · 2024-04-12T08:15:29Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
langchain	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	May 1, 2024 10:24am

ohadeytan · 2024-08-06T23:19:01Z

Hi, that looks very good @BuxianChen!

As part of a group in IBM Research, we are looking for similar needs: building and using a Milvus vector store with the ability to do hybrid search. We would like to continue to use langchain rather than using pymilvus directly.

@efriis can you share what is the intention with this and similar PRs related to hybrid search?
I saw that under the partners directory you have a HybridSearchRetriever, but it lacks the creation of the collection and similar setting that exists in this PR and in regular MilvusVertorStore.

We are willing to implement and submit PRs with your guidance.

BuxianChen · 2024-08-08T13:27:57Z

@ohadeytan Hi, I think this PR is out of date, maybe as well as langchain_community/retrievers/milvus.py, maybe all things related to Milvus should be placed to partner directory.

My PR was completed before their partner PR, but theirs has been merged. But as you point out, their implement lack of the creation of the collection and similar setting in regular MilvusVertorStore.

I think you can communicate with langchain's core dev, then integrate the missing part to partner directory. I think this work needs some refactor, as I borrowed a lot code from MilvusVertorStore.

Best wishes!

BuxianChen · 2024-08-08T13:42:59Z

By the way, I think the awkward things are:

BaseVectorStore is assumed to with a dense vector embedding.
BaseRetriever has no abstract interface like add_document.

I'm also confused about how to deal with that.

ohadeytan · 2024-08-11T10:16:24Z

@BuxianChen, yeah, it seems they are moving to the partners directory, but the question remains, did they support this kind of changes and can provide feedback and guidance.

zc277584121 · 2024-08-26T08:10:50Z

Thank you for you contribution.
Vector store may refer to "dense vector store", and maybe sparse and hybrid functions need to be placed under Retriever. There is now an implementation of MilvusCollectionHybridSearchRetriever.
https://python.langchain.com/v0.2/docs/integrations/retrievers/milvus_hybrid_search/
Here what you see is

collection = Collection(
    ...
)
retriever = MilvusCollectionHybridSearchRetriever(
    collection=collection,
    ...
)

In the near future, MilvusClient SDK will support hybrid.
The best ideal implementation will be like this:

from pymilvus import MilvusClient

client = MilvusClient("milvus_demo.db")

MilvusHybridSearchRetriever(
    client=client,
    ...
)

efriis · 2024-08-26T17:45:19Z

closing because should be in partner package! seems like it might already be there too

…25284) # Description Milvus (and `pymilvus`) recently added the option to use [sparse vectors](https://milvus.io/docs/sparse_vector.md#Sparse-Vector) with appropriate search methods (e.g., `SPARSE_INVERTED_INDEX`) and embeddings (e.g., `BM25`, `SPLADE`). This PR allow creating a vector store using langchain's `Milvus` class, setting the matching vector field type to `DataType.SPARSE_FLOAT_VECTOR` and the default index type to `SPARSE_INVERTED_INDEX`. It is only extending functionality, and backward compatible. ## Note I also interested in extending the Milvus class further to support multi vector search (aka hybrid search). Will be happy to discuss that. See [here](#19955), [here](#20375), and [here](#22886) similar needs. --------- Co-authored-by: Erick Friis <erick@langchain.dev>

add milvus hybrid search

d42b404

dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Apr 12, 2024

dosubot bot added Ɑ: retriever Related to retriever module 🔌: milvus Primarily related to Milvus vector store integration labels Apr 12, 2024

BuxianChen changed the title ~~add milvus hybrid search retriever~~ community: add milvus hybrid search retriever Apr 12, 2024

vercel bot deployed to Preview April 12, 2024 08:23 View deployment

BuxianChen marked this pull request as draft April 12, 2024 08:25

BuxianChen marked this pull request as ready for review April 12, 2024 09:04

dosubot bot added the 🤖:improvement Medium size change to existing code to handle new use-cases label Apr 12, 2024

add more content to notebook, fix some bugs

c909cc7

dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Apr 12, 2024

vercel bot deployed to Preview April 12, 2024 16:31 View deployment

rename search_params to ann_search_params in _get_relevant_documents

339c653

vercel bot deployed to Preview April 14, 2024 14:09 View deployment

update zero/one/multi vector search

8f60fe3

vercel bot deployed to Preview April 15, 2024 04:06 View deployment

Merge branch 'master' into milvus_hybrid_search

469519a

vercel bot deployed to Preview April 27, 2024 01:32 View deployment

Merge branch 'langchain-ai:master' into milvus_hybrid_search

bf64afa

vercel bot deployed to Preview April 30, 2024 09:55 View deployment

update bge-m3 example for milvus_hybrid_search

f7d03f1

vercel bot deployed to Preview May 1, 2024 06:39 View deployment

BuxianChen added 4 commits May 1, 2024 15:15

fix ruff error

c7fa72b

fix mypy error

58c9fad

remove useless info in notebook

5fc53dd

remove some finished todo

09f5d07

vercel bot deployed to Preview May 1, 2024 08:15 View deployment

Merge branch 'master' into milvus_hybrid_search

fdff690

vercel bot deployed to Preview May 1, 2024 10:24 View deployment

ccurme added the community Related to langchain-community label Jun 18, 2024

ohadeytan mentioned this pull request Aug 11, 2024

partners/milvus: allow creating a vectorstore with sparse embeddings #25284

Merged

efriis closed this Aug 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

community: add milvus hybrid search retriever #20375

community: add milvus hybrid search retriever #20375

BuxianChen commented Apr 12, 2024 •

edited

Loading

vercel bot commented Apr 12, 2024 •

edited

Loading

ohadeytan commented Aug 6, 2024 •

edited

Loading

BuxianChen commented Aug 8, 2024 •

edited

Loading

BuxianChen commented Aug 8, 2024

ohadeytan commented Aug 11, 2024

zc277584121 commented Aug 26, 2024

efriis commented Aug 26, 2024

community: add milvus hybrid search retriever #20375

community: add milvus hybrid search retriever #20375

Conversation

BuxianChen commented Apr 12, 2024 • edited Loading

vercel bot commented Apr 12, 2024 • edited Loading

ohadeytan commented Aug 6, 2024 • edited Loading

BuxianChen commented Aug 8, 2024 • edited Loading

BuxianChen commented Aug 8, 2024

ohadeytan commented Aug 11, 2024

zc277584121 commented Aug 26, 2024

efriis commented Aug 26, 2024

BuxianChen commented Apr 12, 2024 •

edited

Loading

vercel bot commented Apr 12, 2024 •

edited

Loading

ohadeytan commented Aug 6, 2024 •

edited

Loading

BuxianChen commented Aug 8, 2024 •

edited

Loading