Skip to content

Commit

Permalink
Refactor BEIR regressions (castorini#1834)
Browse files Browse the repository at this point in the history
+ Add template and pages for multifield regressions (arguana/climate-fever)
+ Tweak YAML files so that runs don't have the same names
  (this prevents regressions from running in parallel because they overwrite the same files)
  • Loading branch information
lintool committed Apr 9, 2022
1 parent 4c8bfe8 commit 2ce877e
Show file tree
Hide file tree
Showing 72 changed files with 463 additions and 211 deletions.
26 changes: 13 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,19 +101,19 @@ See individual pages for details!
+ Regressions for FIRE 2012: [Monolingual Bengali](docs/regressions-fire12-bn.md), [Monolingual Hindi](docs/regressions-fire12-hi.md), [Monolingual English](docs/regressions-fire12-en.md)
+ Regressions for Mr. TyDi (v1.1) baselines : [ar](docs/regressions-mrtydi-v1.1-ar.md), [bn](docs/regressions-mrtydi-v1.1-bn.md), [en](docs/regressions-mrtydi-v1.1-en.md), [fi](docs/regressions-mrtydi-v1.1-fi.md), [id](docs/regressions-mrtydi-v1.1-id.md), [ja](docs/regressions-mrtydi-v1.1-ja.md), [ko](docs/regressions-mrtydi-v1.1-ko.md), [ru](docs/regressions-mrtydi-v1.1-ru.md), [sw](docs/regressions-mrtydi-v1.1-sw.md), [te](docs/regressions-mrtydi-v1.1-te.md), [th](docs/regressions-mrtydi-v1.1-th.md)
+ Regressions for BEIR (v1.0.0):
+ ArguAna: [baseline](docs/regressions-beir-v1.0.0-arguana-flat.md), [SPLADE-distill CoCodenser-medium](docs/regressions-beir-v1.0.0-arguana-splade-distil-cocodenser-medium.md)
+ Climate-FEVER: [baseline](docs/regressions-beir-v1.0.0-climate-fever-flat.md), [SPLADE-distill CoCodenser-medium](docs/regressions-beir-v1.0.0-climate-fever-splade-distil-cocodenser-medium.md)
+ DBPedia: [baseline](docs/regressions-beir-v1.0.0-dbpedia-entity-flat.md), [SPLADE-distill CoCodenser-medium](docs/regressions-beir-v1.0.0-dbpedia-entity-splade-distil-cocodenser-medium.md)
+ FEVER: [baseline](docs/regressions-beir-v1.0.0-fever-flat.md), [SPLADE-distill CoCodenser-medium](docs/regressions-beir-v1.0.0-fever-splade-distil-cocodenser-medium.md)
+ FiQA-2018: [baseline](docs/regressions-beir-v1.0.0-fiqa-flat.md), [SPLADE-distill CoCodenser-medium](docs/regressions-beir-v1.0.0-fiqa-splade-distil-cocodenser-medium.md)
+ HotpotQA: [baseline](docs/regressions-beir-v1.0.0-hotpotqa-flat.md), [SPLADE-distill CoCodenser-medium](docs/regressions-beir-v1.0.0-hotpotqa-splade-distil-cocodenser-medium.md)
+ NFCorpus: [baseline](docs/regressions-beir-v1.0.0-nfcorpus-flat.md), [SPLADE-distill CoCodenser-medium](docs/regressions-beir-v1.0.0-nfcorpus-splade-distil-cocodenser-medium.md)
+ NQ: [baseline](docs/regressions-beir-v1.0.0-nq-flat.md), [SPLADE-distill CoCodenser-medium](docs/regressions-beir-v1.0.0-nq-splade-distil-cocodenser-medium.md)
+ Quora: [baseline](docs/regressions-beir-v1.0.0-quora-flat.md), [SPLADE-distill CoCodenser-medium](docs/regressions-beir-v1.0.0-quora-splade-distil-cocodenser-medium.md)
+ SCIDOCS: [baseline](docs/regressions-beir-v1.0.0-scidocs-flat.md), [SPLADE-distill CoCodenser-medium](docs/regressions-beir-v1.0.0-scidocs-splade-distil-cocodenser-medium.md)
+ SciFact: [baseline](docs/regressions-beir-v1.0.0-scifact-flat.md), [SPLADE-distill CoCodenser-medium](docs/regressions-beir-v1.0.0-scifact-splade-distil-cocodenser-medium.md)
+ TREC-COVID: [baseline](docs/regressions-beir-v1.0.0-trec-covid-flat.md), [SPLADE-distill CoCodenser-medium](docs/regressions-beir-v1.0.0-trec-covid-splade-distil-cocodenser-medium.md)
+ Touche2020: [baseline](docs/regressions-beir-v1.0.0-webis-touche2020-flat.md), [SPLADE-distill CoCodenser-medium](docs/regressions-beir-v1.0.0-webis-touche2020-splade-distil-cocodenser-medium.md)
+ ArguAna: ["flat" baseline](docs/regressions-beir-v1.0.0-arguana-flat.md), ["multifield" baseline](docs/regressions-beir-v1.0.0-arguana-multifield.md), [SPLADE-distill CoCodenser-medium](docs/regressions-beir-v1.0.0-arguana-splade-distil-cocodenser-medium.md)
+ Climate-FEVER: ["flat" baseline](docs/regressions-beir-v1.0.0-climate-fever-flat.md), ["multifield" baseline](docs/regressions-beir-v1.0.0-climate-fever-multifield.md), [SPLADE-distill CoCodenser-medium](docs/regressions-beir-v1.0.0-climate-fever-splade-distil-cocodenser-medium.md)
+ DBPedia: ["flat" baseline](docs/regressions-beir-v1.0.0-dbpedia-entity-flat.md), [SPLADE-distill CoCodenser-medium](docs/regressions-beir-v1.0.0-dbpedia-entity-splade-distil-cocodenser-medium.md)
+ FEVER: ["flat" baseline](docs/regressions-beir-v1.0.0-fever-flat.md), [SPLADE-distill CoCodenser-medium](docs/regressions-beir-v1.0.0-fever-splade-distil-cocodenser-medium.md)
+ FiQA-2018: ["flat" baseline](docs/regressions-beir-v1.0.0-fiqa-flat.md), [SPLADE-distill CoCodenser-medium](docs/regressions-beir-v1.0.0-fiqa-splade-distil-cocodenser-medium.md)
+ HotpotQA: ["flat" baseline](docs/regressions-beir-v1.0.0-hotpotqa-flat.md), [SPLADE-distill CoCodenser-medium](docs/regressions-beir-v1.0.0-hotpotqa-splade-distil-cocodenser-medium.md)
+ NFCorpus: ["flat" baseline](docs/regressions-beir-v1.0.0-nfcorpus-flat.md), [SPLADE-distill CoCodenser-medium](docs/regressions-beir-v1.0.0-nfcorpus-splade-distil-cocodenser-medium.md)
+ NQ: ["flat" baseline](docs/regressions-beir-v1.0.0-nq-flat.md), [SPLADE-distill CoCodenser-medium](docs/regressions-beir-v1.0.0-nq-splade-distil-cocodenser-medium.md)
+ Quora: ["flat" baseline](docs/regressions-beir-v1.0.0-quora-flat.md), [SPLADE-distill CoCodenser-medium](docs/regressions-beir-v1.0.0-quora-splade-distil-cocodenser-medium.md)
+ SCIDOCS: ["flat" baseline](docs/regressions-beir-v1.0.0-scidocs-flat.md), [SPLADE-distill CoCodenser-medium](docs/regressions-beir-v1.0.0-scidocs-splade-distil-cocodenser-medium.md)
+ SciFact: ["flat" baseline](docs/regressions-beir-v1.0.0-scifact-flat.md), [SPLADE-distill CoCodenser-medium](docs/regressions-beir-v1.0.0-scifact-splade-distil-cocodenser-medium.md)
+ TREC-COVID: ["flat" baseline](docs/regressions-beir-v1.0.0-trec-covid-flat.md), [SPLADE-distill CoCodenser-medium](docs/regressions-beir-v1.0.0-trec-covid-splade-distil-cocodenser-medium.md)
+ Touche2020: ["flat" baseline](docs/regressions-beir-v1.0.0-webis-touche2020-flat.md), [SPLADE-distill CoCodenser-medium](docs/regressions-beir-v1.0.0-webis-touche2020-splade-distil-cocodenser-medium.md)

## Additional Documentation

Expand Down
13 changes: 7 additions & 6 deletions docs/regressions-beir-v1.0.0-arguana-flat.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Anserini Regressions: BEIR (v1.0.0) — arguana

This page documents BM25 regression experiments for [BEIR (v1.0.0) — arguana](http://beir.ai/).
These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field.

The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-arguana-flat.yaml).
Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-arguana-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
Expand All @@ -18,11 +19,11 @@ Typical indexing command:
```
target/appassembler/bin/IndexCollection \
-collection BeirFlatCollection \
-input /path/to/beir-v1.0.0-arguana \
-input /path/to/beir-v1.0.0-arguana-flat \
-index indexes/lucene-index.beir-v1.0.0-arguana-flat/ \
-generator DefaultLuceneDocumentGenerator \
-threads 1 -storePositions -storeDocvectors -storeRaw \
>& logs/log.beir-v1.0.0-arguana &
>& logs/log.beir-v1.0.0-arguana-flat &
```

For additional details, see explanation of [common indexing options](common-indexing-options.md).
Expand All @@ -36,16 +37,16 @@ target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.beir-v1.0.0-arguana-flat/ \
-topics src/main/resources/topics-and-qrels/topics.beir-v1.0.0-arguana.test.tsv.gz \
-topicreader TsvString \
-output runs/run.beir-v1.0.0-arguana.bm25.topics.beir-v1.0.0-arguana.test.txt \
-output runs/run.beir-v1.0.0-arguana-flat.bm25.topics.beir-v1.0.0-arguana.test.txt \
-bm25 -removeQuery -hits 1000 &
```

Evaluation can be performed using `trec_eval`:

```
tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.beir-v1.0.0-arguana.test.txt runs/run.beir-v1.0.0-arguana.bm25.topics.beir-v1.0.0-arguana.test.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.beir-v1.0.0-arguana.test.txt runs/run.beir-v1.0.0-arguana.bm25.topics.beir-v1.0.0-arguana.test.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.beir-v1.0.0-arguana.test.txt runs/run.beir-v1.0.0-arguana.bm25.topics.beir-v1.0.0-arguana.test.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.beir-v1.0.0-arguana.test.txt runs/run.beir-v1.0.0-arguana-flat.bm25.topics.beir-v1.0.0-arguana.test.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.beir-v1.0.0-arguana.test.txt runs/run.beir-v1.0.0-arguana-flat.bm25.topics.beir-v1.0.0-arguana.test.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.beir-v1.0.0-arguana.test.txt runs/run.beir-v1.0.0-arguana-flat.bm25.topics.beir-v1.0.0-arguana.test.txt
```

## Effectiveness
Expand Down
69 changes: 69 additions & 0 deletions docs/regressions-beir-v1.0.0-arguana-multifield.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Anserini Regressions: BEIR (v1.0.0) — arguana

This page documents BM25 regression experiments for [BEIR (v1.0.0) — arguana](http://beir.ai/).
These experiments index the "title" and "text" fields in corpus separately.
At retrieval time, a query is issued across both fields (equally weighted).

The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-arguana-multifield.yaml).
Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-arguana-multifield.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.

From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:

```
python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-arguana-multifield
```

## Indexing

Typical indexing command:

```
target/appassembler/bin/IndexCollection \
-collection BeirMultifieldCollection \
-input /path/to/beir-v1.0.0-arguana-multifield \
-index indexes/lucene-index.beir-v1.0.0-arguana-multifield/ \
-generator DefaultLuceneDocumentGenerator \
-threads 1 -storePositions -storeDocvectors -storeRaw -fields title \
>& logs/log.beir-v1.0.0-arguana-multifield &
```

For additional details, see explanation of [common indexing options](common-indexing-options.md).

## Retrieval

After indexing has completed, you should be able to perform retrieval as follows:

```
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.beir-v1.0.0-arguana-multifield/ \
-topics src/main/resources/topics-and-qrels/topics.beir-v1.0.0-arguana.test.tsv.gz \
-topicreader TsvString \
-output runs/run.beir-v1.0.0-arguana-multifield.bm25.topics.beir-v1.0.0-arguana.test.txt \
-bm25 -removeQuery -hits 1000 -fields contents=1.0 title=1.0 &
```

Evaluation can be performed using `trec_eval`:

```
tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.beir-v1.0.0-arguana.test.txt runs/run.beir-v1.0.0-arguana-multifield.bm25.topics.beir-v1.0.0-arguana.test.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.beir-v1.0.0-arguana.test.txt runs/run.beir-v1.0.0-arguana-multifield.bm25.topics.beir-v1.0.0-arguana.test.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.beir-v1.0.0-arguana.test.txt runs/run.beir-v1.0.0-arguana-multifield.bm25.topics.beir-v1.0.0-arguana.test.txt
```

## Effectiveness

With the above commands, you should be able to reproduce the following results:

| nDCG@10 | BM25 |
|:-------------------------------------------------------------------------------------------------------------|-----------|
| BEIR (v1.0.0): arguana | 0.4142 |


| R@100 | BM25 |
|:-------------------------------------------------------------------------------------------------------------|-----------|
| BEIR (v1.0.0): arguana | 0.9431 |


| R@1000 | BM25 |
|:-------------------------------------------------------------------------------------------------------------|-----------|
| BEIR (v1.0.0): arguana | 0.9893 |
Original file line number Diff line number Diff line change
Expand Up @@ -47,14 +47,14 @@ Sample indexing command:
```
target/appassembler/bin/IndexCollection \
-collection JsonVectorCollection \
-input /path/to/beir-v1.0.0-arguana \
-input /path/to/beir-v1.0.0-arguana-splade_distil_cocodenser_medium \
-index indexes/lucene-index.beir-v1.0.0-arguana-splade_distil_cocodenser_medium/ \
-generator DefaultLuceneDocumentGenerator \
-threads 16 -impact -pretokenized \
>& logs/log.beir-v1.0.0-arguana &
>& logs/log.beir-v1.0.0-arguana-splade_distil_cocodenser_medium &
```

The path `/path/to/beir-v1.0.0-arguana/` should point to the corpus downloaded above.
The path `/path/to/beir-v1.0.0-arguana-splade_distil_cocodenser_medium/` should point to the corpus downloaded above.

The important indexing options to note here are `-impact -pretokenized`: the first tells Anserini not to encode BM25 doclengths into Lucene's norms (which is the default) and the second option says not to apply any additional tokenization on the pre-encoded tokens.
Upon completion, we should have an index with 8,674 documents.
Expand All @@ -73,16 +73,16 @@ target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.beir-v1.0.0-arguana-splade_distil_cocodenser_medium/ \
-topics src/main/resources/topics-and-qrels/topics.beir-v1.0.0-arguana.test.splade_distil_cocodenser_medium.tsv.gz \
-topicreader TsvString \
-output runs/run.beir-v1.0.0-arguana.splade_distil_cocodenser_medium.topics.beir-v1.0.0-arguana.test.splade_distil_cocodenser_medium.txt \
-output runs/run.beir-v1.0.0-arguana-splade_distil_cocodenser_medium.splade_distil_cocodenser_medium.topics.beir-v1.0.0-arguana.test.splade_distil_cocodenser_medium.txt \
-impact -pretokenized -removeQuery -hits 1000 &
```

Evaluation can be performed using `trec_eval`:

```
tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.beir-v1.0.0-arguana.test.txt runs/run.beir-v1.0.0-arguana.splade_distil_cocodenser_medium.topics.beir-v1.0.0-arguana.test.splade_distil_cocodenser_medium.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.beir-v1.0.0-arguana.test.txt runs/run.beir-v1.0.0-arguana.splade_distil_cocodenser_medium.topics.beir-v1.0.0-arguana.test.splade_distil_cocodenser_medium.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.beir-v1.0.0-arguana.test.txt runs/run.beir-v1.0.0-arguana.splade_distil_cocodenser_medium.topics.beir-v1.0.0-arguana.test.splade_distil_cocodenser_medium.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.beir-v1.0.0-arguana.test.txt runs/run.beir-v1.0.0-arguana-splade_distil_cocodenser_medium.splade_distil_cocodenser_medium.topics.beir-v1.0.0-arguana.test.splade_distil_cocodenser_medium.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.beir-v1.0.0-arguana.test.txt runs/run.beir-v1.0.0-arguana-splade_distil_cocodenser_medium.splade_distil_cocodenser_medium.topics.beir-v1.0.0-arguana.test.splade_distil_cocodenser_medium.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.beir-v1.0.0-arguana.test.txt runs/run.beir-v1.0.0-arguana-splade_distil_cocodenser_medium.splade_distil_cocodenser_medium.topics.beir-v1.0.0-arguana.test.splade_distil_cocodenser_medium.txt
```

## Effectiveness
Expand Down
13 changes: 7 additions & 6 deletions docs/regressions-beir-v1.0.0-climate-fever-flat.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Anserini Regressions: BEIR (v1.0.0) — climate-fever

This page documents BM25 regression experiments for [BEIR (v1.0.0) — climate-fever](http://beir.ai/).
These experiments index the corpus in a "flat" manner, by concatenating the "title" and "text" into the "contents" field.

The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/beir-v1.0.0-climate-fever-flat.yaml).
Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-flat.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
Expand All @@ -18,11 +19,11 @@ Typical indexing command:
```
target/appassembler/bin/IndexCollection \
-collection BeirFlatCollection \
-input /path/to/beir-v1.0.0-climate-fever \
-input /path/to/beir-v1.0.0-climate-fever-flat \
-index indexes/lucene-index.beir-v1.0.0-climate-fever-flat/ \
-generator DefaultLuceneDocumentGenerator \
-threads 1 -storePositions -storeDocvectors -storeRaw \
>& logs/log.beir-v1.0.0-climate-fever &
>& logs/log.beir-v1.0.0-climate-fever-flat &
```

For additional details, see explanation of [common indexing options](common-indexing-options.md).
Expand All @@ -36,16 +37,16 @@ target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.beir-v1.0.0-climate-fever-flat/ \
-topics src/main/resources/topics-and-qrels/topics.beir-v1.0.0-climate-fever.test.tsv.gz \
-topicreader TsvString \
-output runs/run.beir-v1.0.0-climate-fever.bm25.topics.beir-v1.0.0-climate-fever.test.txt \
-output runs/run.beir-v1.0.0-climate-fever-flat.bm25.topics.beir-v1.0.0-climate-fever.test.txt \
-bm25 -removeQuery -hits 1000 &
```

Evaluation can be performed using `trec_eval`:

```
tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.beir-v1.0.0-climate-fever.test.txt runs/run.beir-v1.0.0-climate-fever.bm25.topics.beir-v1.0.0-climate-fever.test.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.beir-v1.0.0-climate-fever.test.txt runs/run.beir-v1.0.0-climate-fever.bm25.topics.beir-v1.0.0-climate-fever.test.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.beir-v1.0.0-climate-fever.test.txt runs/run.beir-v1.0.0-climate-fever.bm25.topics.beir-v1.0.0-climate-fever.test.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -m ndcg_cut.10 src/main/resources/topics-and-qrels/qrels.beir-v1.0.0-climate-fever.test.txt runs/run.beir-v1.0.0-climate-fever-flat.bm25.topics.beir-v1.0.0-climate-fever.test.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 src/main/resources/topics-and-qrels/qrels.beir-v1.0.0-climate-fever.test.txt runs/run.beir-v1.0.0-climate-fever-flat.bm25.topics.beir-v1.0.0-climate-fever.test.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 src/main/resources/topics-and-qrels/qrels.beir-v1.0.0-climate-fever.test.txt runs/run.beir-v1.0.0-climate-fever-flat.bm25.topics.beir-v1.0.0-climate-fever.test.txt
```

## Effectiveness
Expand Down
Loading

0 comments on commit 2ce877e

Please sign in to comment.