This repository contains my work for the "Deep Learning Aplicado à Sistemas de Busca" course at Unicamp, taken in 1s2023
The folders contain my code work for each week's project:
- Week 1: Building a Simple Information Retrieval System using BM25 and GPT-3 and evaluated in the CISI collection
- Week 2: Boolean, BoW and TF-IDF Search Systems
- Week 3: Cross-Encoder: Text Classification and Reranking
- Week 4: Zero and Few-Shot Learning
- Week 5: Training a language model
- Week 6: doc2query and docTTTTTquery
- Week 7: Dense Passage Retrieval for Open Domain Question Answering
- Week 8: SPLADE
- Week 9: InPars
- Week 10: Trade-offs between computation & latency
- Week 2: Pretrained Transformers for Text Ranking: BERT and Beyond (Jimmy Lin, Rodrigo Nogueira, Andrew Yates), Chapter 1.
- Week 3: Pretrained Transformers for Text Ranking: BERT and Beyond (Jimmy Lin, Rodrigo Nogueira, Andrew Yates), Chapter 3 to 3.2.2.
- Week 4: Language Models are Few-Shot Learners (OpenAI)
- Week 5: Language Models are Unsupervised Multitask Learners (OpenAI)
- Week 6: Document Expansion by Query Prediction (Nogueira et al 2019) & From doc2query to docTTTTTquery (Nogueira & Lin 2019)
- Week 7: Dense Passage Retrieval for Open-Domain Question Answering & ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT
- Week 8: SPLADE & SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval (Formal et al 2021; Formal et al 2021) -Slides
- Week 9: InPars: Data Augmentation for Information Retrieval & InPars v2: Large Language Models as Efficient Dataset Generators for Information Retrieval: Sparse Lexical and Expansion Model for Information Retrieval (Bonifacio et al 2022; Jeronymo et al 2023) -Slides
- Week 10: ColBERT v2: Effective and Efficient Retrieval via Lightweight Late Interaction (Santhanam et al 2022)