MS Marco Passage Ranking

This sample application demonstrates how to efficiently represent three ways of applying Transformer-based ranking models for text ranking in Vespa.

Blog posts with more details:

Post one: Introduction to neural ranking and the MS Marco passage ranking dataset.
Post two: Efficient retrievers, sparse, dense, and hybrid retrievers.
Post three: Re-ranking using multi-representation models (ColBERT).
Post four: Re-ranking using cross-encoders.

Transformers for Ranking

Illustration from ColBERT paper.

This sample application demonstrates:

Simple single-stage sparse retrieval accelerated by the WAND dynamic pruning algorithm with BM25 ranking.
Dense (vector) search retrieval for efficient candidate retrieval using Vespa's support for approximate nearest neighbor search. Illustrated in figure a.
Re-ranking using the Late contextual interaction over BERT (ColBERT) model This method is illustrated in figure d.
Re-ranking using a cross-encoder with cross attention between the query and document terms. This method is illustrated in figure c.
Multiphase retrieval and ranking combining efficient retrieval (WAND or ANN) with re-ranking stages.
Using Vespa embedder functionality.
Hybrid ranking functionality

Retrieval and Ranking

There are several ranking profiles defined in the passage document schema. See vespa ranking documentation for an overview of how to represent ranking in Vespa.

Quick start

Make sure to read and agree to the terms and conditions of MS Marco before downloading the dataset. The following is a quick start recipe for getting started with a tiny slice of the ms marco passage ranking dataset.

Requirements:

Docker Desktop installed and running. 6 GB available memory for Docker. Refer to Docker memory for details and troubleshooting
Alternatively, deploy using Vespa Cloud
Operating system: Linux, macOS, or Windows 10 Pro (Docker requirement)
Architecture: x86_64 or arm64
Homebrew to install Vespa CLI, or download a vespa-cli release from GitHub releases.
python (requests, tqdm, ir_datasets)

Validate Docker resource settings, which should be a minimum of 6 GB:

$ docker info | grep "Total Memory"

Install Vespa CLI:

$ brew install vespa-cli

Install python dependencies for exporting the passage dataset:

$ pip3 install ir_datasets

For local deployment using docker image:

$ vespa config set target local

Pull and start the vespa container image:

$ docker pull vespaengine/vespa
$ docker run --detach --name vespa --hostname vespa-container \
  --publish 8080:8080 --publish 19071:19071 \
  vespaengine/vespa

Verify that the configuration service (deploy API) is ready:

$ vespa status deploy --wait 300

Download this sample application:

$ vespa clone msmarco-ranking myapp && cd myapp

Export the cross-encoder ranker model to onnx format using the Optimum library from HF or download an exported ONNX version of the model (like in this example)

$ mkdir -p models
$ curl -L https://huggingface.co/Xenova/ms-marco-MiniLM-L-6-v2/resolve/main/onnx/model.onnx -o models/model.onnx
$ curl -L https://huggingface.co/Xenova/ms-marco-MiniLM-L-6-v2/raw/main/tokenizer.json -o models/tokenizer.json

Deploy the application:

$ vespa deploy --wait 300

Feeding sample data

Feed a small sample of data:

$ vespa feed ext/docs.jsonl

Query examples

For example, do a query for what was the Manhattan Project:

Note that the @query parameter substitution syntax requires Vespa 8.299 or above.

vespa query 'query=what was the manhattan project' \
 'yql=select * from passage where {targetHits: 100}nearestNeighbor(e5, q)'\
 'input.query(q)=embed(e5, @query)' \
 'input.query(qt)=embed(colbert, @query)' \
 'ranking=e5-colbert'

vespa query 'query=what was the manhattan project' \
 'yql=select * from passage where userQuery() or ({targetHits: 100}nearestNeighbor(e5, q))'\
 'input.query(q)=embed(e5, @query)' \
 'input.query(qt)=embed(colbert, @query)' \
 'input.query(query_token_ids)=embed(tokenizer, @query)' \
 'ranking=e5-colbert-cross-encoder-rrf'

Shutdown and remove the Docker container:

$ docker rm -f vespa

Ranking Evaluation using Ms Marco Passage Ranking development queries

With the evaluate_passage_run.py we can run retrieval and ranking using the methods demonstrated.

To do so, we need to index the entire dataset as follows:

ir_datasets export msmarco-passage docs --format jsonl |python3 python/to-vespa-feed.py | vespa feed -

Note that the ir_datasets utility will download MS Marco query evaluation data, so the first run will take some time to complete.

BM25(WAND) Single-phase sparse retrieval

$ ./python/evaluate_passage_run.py --query_split dev --model bm25 --endpoint \
  http://localhost:8080/search/

To evaluate ranking effectiveness, download the official MS Marco evaluation script:

$ curl -L -o msmarco_eval.py https://raw.githubusercontent.com/spacemanidol/MSMARCO/master/Ranking/Baselines/msmarco_eval.py

Generate the dev qrels (query relevancy labels) file using the ir_datasets:

$ ./python/dump_passage_dev_qrels.py

Above will write a qrels.dev.small.tsv file to the current directory, now we can evaluate using the run.dev.txt file created by any of the evaluate_passage_run.py runs listed above:

$ python3 msmarco_eval.py qrels.dev.small.tsv run.dev.txt
#####################
MRR @10: 0.xx
QueriesRanked: 6980
#####################

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

MS Marco Passage Ranking

Transformers for Ranking

Retrieval and Ranking

Quick start

Feeding sample data

Query examples

Shutdown and remove the Docker container:

Ranking Evaluation using Ms Marco Passage Ranking development queries

Files

README.md

Latest commit

History

README.md

File metadata and controls

MS Marco Passage Ranking

Transformers for Ranking

Retrieval and Ranking

Quick start

Feeding sample data

Query examples

Shutdown and remove the Docker container:

Ranking Evaluation using Ms Marco Passage Ranking development queries