Skip to content

Commit

Permalink
docs: update Elasticsearch strategy names (langchain-ai#21530)
Browse files Browse the repository at this point in the history
Update documentation with the [new names for retrieval
strategies](langchain-ai/langchain-elastic#22)

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
  • Loading branch information
maxjakob and efriis authored May 17, 2024
1 parent cdc8e2d commit e6b7a17
Showing 1 changed file with 73 additions and 52 deletions.
125 changes: 73 additions & 52 deletions docs/docs/integrations/vectorstores/elasticsearch.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 3,
"id": "67ab8afa-f7c6-4fbf-b596-cb512da949da",
"metadata": {
"id": "67ab8afa-f7c6-4fbf-b596-cb512da949da",
Expand Down Expand Up @@ -194,7 +194,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 4,
"id": "aac9563e",
"metadata": {
"id": "aac9563e",
Expand All @@ -208,7 +208,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 5,
"id": "a3c3999a",
"metadata": {
"id": "a3c3999a",
Expand All @@ -229,7 +229,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 6,
"id": "12eb86d8",
"metadata": {
"id": "12eb86d8",
Expand Down Expand Up @@ -271,7 +271,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 7,
"id": "5d076412",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -313,7 +313,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 8,
"id": "b2a4bd1b",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -345,7 +345,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 9,
"id": "f3d294ff",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -375,7 +375,7 @@
},
{
"cell_type": "code",
"execution_count": 59,
"execution_count": 10,
"id": "55b63a61",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -405,7 +405,7 @@
},
{
"cell_type": "code",
"execution_count": 60,
"execution_count": 11,
"id": "9b831b3d",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -435,7 +435,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 12,
"id": "fb1482e7",
"metadata": {},
"outputs": [],
Expand Down Expand Up @@ -504,27 +504,29 @@
"metadata": {},
"source": [
"# Retrieval Strategies\n",
"Elasticsearch has big advantages over other vector only databases from its ability to support a wide range of retrieval strategies. In this notebook we will configure `ElasticsearchStore` to support some of the most common retrieval strategies. \n",
"Elasticsearch has big advantages over other vector only databases from its ability to support a wide range of retrieval strategies. In this notebook we will configure `ElasticsearchStore` to support some of the most common retrieval strategies. \n",
"\n",
"By default, `ElasticsearchStore` uses the `ApproxRetrievalStrategy`.\n",
"By default, `ElasticsearchStore` uses the `DenseVectorStrategy` (was called `ApproxRetrievalStrategy` prior to version 0.2.0).\n",
"\n",
"## ApproxRetrievalStrategy\n",
"This will return the top `k` most similar vectors to the query vector. The `k` parameter is set when the `ElasticsearchStore` is initialized. The default value is `10`."
"## DenseVectorStrategy\n",
"This will return the top `k` most similar vectors to the query vector. The `k` parameter is set when the `ElasticsearchStore` is initialized. The default value is `10`."
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 13,
"id": "999b5ef5",
"metadata": {},
"outputs": [],
"source": [
"from langchain_elasticsearch import DenseVectorStrategy\n",
"\n",
"db = ElasticsearchStore.from_documents(\n",
" docs,\n",
" embeddings,\n",
" es_url=\"http://localhost:9200\",\n",
" index_name=\"test\",\n",
" strategy=ElasticsearchStore.ApproxRetrievalStrategy(),\n",
" strategy=DenseVectorStrategy(),\n",
")\n",
"\n",
"docs = db.similarity_search(\n",
Expand All @@ -537,12 +539,12 @@
"id": "9b651be5",
"metadata": {},
"source": [
"### Example: Approx with hybrid\n",
"### Example: Hybrid retrieval with dense vector and keyword search\n",
"This example will show how to configure `ElasticsearchStore` to perform a hybrid retrieval, using a combination of approximate semantic search and keyword based search. \n",
"\n",
"We use RRF to balance the two scores from different retrieval methods.\n",
"\n",
"To enable hybrid retrieval, we need to set `hybrid=True` in `ElasticsearchStore` `ApproxRetrievalStrategy` constructor.\n",
"To enable hybrid retrieval, we need to set `hybrid=True` in the `DenseVectorStrategy` constructor.\n",
"\n",
"```python\n",
"\n",
Expand All @@ -551,9 +553,7 @@
" embeddings, \n",
" es_url=\"http://localhost:9200\", \n",
" index_name=\"test\",\n",
" strategy=ElasticsearchStore.ApproxRetrievalStrategy(\n",
" hybrid=True,\n",
" )\n",
" strategy=DenseVectorStrategy(hybrid=True)\n",
")\n",
"```\n",
"\n",
Expand Down Expand Up @@ -582,35 +582,33 @@
"}\n",
"```\n",
"\n",
"### Example: Approx with Embedding Model in Elasticsearch\n",
"This example will show how to configure `ElasticsearchStore` to use the embedding model deployed in Elasticsearch for approximate retrieval. \n",
"### Example: Dense vector search with Embedding Model in Elasticsearch\n",
"This example will show how to configure `ElasticsearchStore` to use the embedding model deployed in Elasticsearch for dense vector retrieval.\n",
"\n",
"To use this, specify the model_id in `ElasticsearchStore` `ApproxRetrievalStrategy` constructor via the `query_model_id` argument.\n",
"To use this, specify the model_id in `DenseVectorStrategy` constructor via the `query_model_id` argument.\n",
"\n",
"**NOTE** This requires the model to be deployed and running in Elasticsearch ml node. See [notebook example](https://github.com/elastic/elasticsearch-labs/blob/main/notebooks/integrations/hugging-face/loading-model-from-hugging-face.ipynb) on how to deploy the model with eland.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 14,
"id": "0a0c85e7",
"metadata": {},
"outputs": [],
"source": [
"APPROX_SELF_DEPLOYED_INDEX_NAME = \"test-approx-self-deployed\"\n",
"DENSE_SELF_DEPLOYED_INDEX_NAME = \"test-dense-self-deployed\"\n",
"\n",
"# Note: This does not have an embedding function specified\n",
"# Instead, we will use the embedding model deployed in Elasticsearch\n",
"db = ElasticsearchStore(\n",
" es_cloud_id=\"<your cloud id>\",\n",
" es_user=\"elastic\",\n",
" es_password=\"<your password>\",\n",
" index_name=APPROX_SELF_DEPLOYED_INDEX_NAME,\n",
" index_name=DENSE_SELF_DEPLOYED_INDEX_NAME,\n",
" query_field=\"text_field\",\n",
" vector_query_field=\"vector_query_field.predicted_value\",\n",
" strategy=ElasticsearchStore.ApproxRetrievalStrategy(\n",
" query_model_id=\"sentence-transformers__all-minilm-l6-v2\"\n",
" ),\n",
" strategy=DenseVectorStrategy(model_id=\"sentence-transformers__all-minilm-l6-v2\"),\n",
")\n",
"\n",
"# Setup a Ingest Pipeline to perform the embedding\n",
Expand All @@ -631,7 +629,7 @@
"# creating a new index with the pipeline,\n",
"# not relying on langchain to create the index\n",
"db.client.indices.create(\n",
" index=APPROX_SELF_DEPLOYED_INDEX_NAME,\n",
" index=DENSE_SELF_DEPLOYED_INDEX_NAME,\n",
" mappings={\n",
" \"properties\": {\n",
" \"text_field\": {\"type\": \"text\"},\n",
Expand All @@ -655,12 +653,10 @@
" es_cloud_id=\"<cloud id>\",\n",
" es_user=\"elastic\",\n",
" es_password=\"<cloud password>\",\n",
" index_name=APPROX_SELF_DEPLOYED_INDEX_NAME,\n",
" index_name=DENSE_SELF_DEPLOYED_INDEX_NAME,\n",
" query_field=\"text_field\",\n",
" vector_query_field=\"vector_query_field.predicted_value\",\n",
" strategy=ElasticsearchStore.ApproxRetrievalStrategy(\n",
" query_model_id=\"sentence-transformers__all-minilm-l6-v2\"\n",
" ),\n",
" strategy=DenseVectorStrategy(model_id=\"sentence-transformers__all-minilm-l6-v2\"),\n",
")\n",
"\n",
"# Perform search\n",
Expand All @@ -672,12 +668,12 @@
"id": "53959de6",
"metadata": {},
"source": [
"## SparseVectorRetrievalStrategy (ELSER)\n",
"## SparseVectorStrategy (ELSER)\n",
"This strategy uses Elasticsearch's sparse vector retrieval to retrieve the top-k results. We only support our own \"ELSER\" embedding model for now.\n",
"\n",
"**NOTE** This requires the ELSER model to be deployed and running in Elasticsearch ml node. \n",
"\n",
"To use this, specify `SparseVectorRetrievalStrategy` in `ElasticsearchStore` constructor."
"To use this, specify `SparseVectorStrategy` (was called `SparseVectorRetrievalStrategy` prior to version 0.2.0) in the `ElasticsearchStore` constructor. You will need to provide a model ID."
]
},
{
Expand All @@ -695,15 +691,17 @@
}
],
"source": [
"from langchain_elasticsearch import SparseVectorStrategy\n",
"\n",
"# Note that this example doesn't have an embedding function. This is because we infer the tokens at index time and at query time within Elasticsearch.\n",
"# This requires the ELSER model to be loaded and running in Elasticsearch.\n",
"db = ElasticsearchStore.from_documents(\n",
" docs,\n",
" es_cloud_id=\"My_deployment:dXMtY2VudHJhbDEuZ2NwLmNsb3VkLmVzLmlvOjQ0MyQ2OGJhMjhmNDc1M2Y0MWVjYTk2NzI2ZWNkMmE5YzRkNyQ3NWI4ODRjNWQ2OTU0MTYzODFjOTkxNmQ1YzYxMGI1Mw==\",\n",
" es_cloud_id=\"<cloud id>\",\n",
" es_user=\"elastic\",\n",
" es_password=\"GgUPiWKwEzgHIYdHdgPk1Lwi\",\n",
" es_password=\"<cloud password>\",\n",
" index_name=\"test-elser\",\n",
" strategy=ElasticsearchStore.SparseVectorRetrievalStrategy(),\n",
" strategy=SparseVectorStrategy(model_id=\".elser_model_2\"),\n",
")\n",
"\n",
"db.client.indices.refresh(index=\"test-elser\")\n",
Expand All @@ -719,19 +717,42 @@
"id": "edf3a093",
"metadata": {},
"source": [
"## ExactRetrievalStrategy\n",
"This strategy uses Elasticsearch's exact retrieval (also known as brute force) to retrieve the top-k results.\n",
"## DenseVectorScriptScoreStrategy\n",
"This strategy uses Elasticsearch's script score query to perform exact vector retrieval (also known as brute force) to retrieve the top-k results. (This strategy was called `ExactRetrievalStrategy` prior to version 0.2.0.)\n",
"\n",
"To use this, specify `ExactRetrievalStrategy` in `ElasticsearchStore` constructor.\n",
"To use this, specify `DenseVectorScriptScoreStrategy` in `ElasticsearchStore` constructor.\n",
"\n",
"```python\n",
"from langchain_elasticsearch import SparseVectorStrategy\n",
"\n",
"db = ElasticsearchStore.from_documents(\n",
" docs, \n",
" embeddings, \n",
" es_url=\"http://localhost:9200\", \n",
" index_name=\"test\",\n",
" strategy=ElasticsearchStore.ExactRetrievalStrategy()\n",
" strategy=DenseVectorScriptScoreStrategy(),\n",
")\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "11b51c47",
"metadata": {},
"source": [
"## BM25Strategy\n",
"Finally, you can use full-text keyword search.\n",
"\n",
"To use this, specify `BM25Strategy` in `ElasticsearchStore` constructor.\n",
"\n",
"```python\n",
"from langchain_elasticsearch import BM25Strategy\n",
"\n",
"db = ElasticsearchStore.from_documents(\n",
" docs, \n",
" es_url=\"http://localhost:9200\", \n",
" index_name=\"test\",\n",
" strategy=BM25Strategy(),\n",
")\n",
"```"
]
Expand Down Expand Up @@ -924,9 +945,9 @@
"\n",
"## What's new?\n",
"\n",
"The new implementation is now one class called `ElasticsearchStore` which can be used for approx, exact, and ELSER search retrieval, via strategies.\n",
"The new implementation is now one class called `ElasticsearchStore` which can be used for approximate dense vector, exact dense vector, sparse vector (ELSER), BM25 retrieval and hybrid retrieval, via strategies.\n",
"\n",
"## Im using ElasticKNNSearch\n",
"## I am using ElasticKNNSearch\n",
"\n",
"Old implementation:\n",
"\n",
Expand All @@ -946,21 +967,21 @@
"\n",
"```python\n",
"\n",
"from langchain_elasticsearch import ElasticsearchStore\n",
"from langchain_elasticsearch import ElasticsearchStore, DenseVectorStrategy\n",
"\n",
"db = ElasticsearchStore(\n",
" es_url=\"http://localhost:9200\",\n",
" index_name=\"test_index\",\n",
" embedding=embedding,\n",
" # if you use the model_id\n",
" # strategy=ElasticsearchStore.ApproxRetrievalStrategy( query_model_id=\"test_model\" )\n",
" # strategy=DenseVectorStrategy(model_id=\"test_model\")\n",
" # if you use hybrid search\n",
" # strategy=ElasticsearchStore.ApproxRetrievalStrategy( hybrid=True )\n",
" # strategy=DenseVectorStrategy(hybrid=True)\n",
")\n",
"\n",
"```\n",
"\n",
"## Im using ElasticVectorSearch\n",
"## I am using ElasticVectorSearch\n",
"\n",
"Old implementation:\n",
"\n",
Expand All @@ -980,13 +1001,13 @@
"\n",
"```python\n",
"\n",
"from langchain_elasticsearch import ElasticsearchStore\n",
"from langchain_elasticsearch import ElasticsearchStore, DenseVectorScriptScoreStrategy\n",
"\n",
"db = ElasticsearchStore(\n",
" es_url=\"http://localhost:9200\",\n",
" index_name=\"test_index\",\n",
" embedding=embedding,\n",
" strategy=ElasticsearchStore.ExactRetrievalStrategy()\n",
" strategy=DenseVectorScriptScoreStrategy()\n",
")\n",
"\n",
"```"
Expand Down

0 comments on commit e6b7a17

Please sign in to comment.