This project focuses on enhancing a movie recommendation chatbot using the Retrieval-Augmented Generation (RAG) methodology. The chatbot leverages a vector store database, Pinecone, for efficient information retrieval and employs the Langchain framework for the RAG pipeline. Additionally, the Llama-2-7b model serves as the backbone, fine-tuned to optimize performance.
- Utilizes Pinecone for optimized ranking in small to medium-sized datasets (≤100k records).
- Incorporates movie embeddings, generated using the Llama-2-7b model, along with metadata, actors' names, and genres.
- Implements Langchain for the RAG pipeline, offering low-level configurability.
- Fine-tunes Llama-2-7b using 4-bit precision through QLoRA for efficient memory usage and adaptation to movie recommendation tasks.
- Tailors the RAG pipeline prompt for movie recommendations.
- Emphasizes limitations and goals, instructing the chatbot to focus on user preferences and not provide information beyond the vector store context.
- Utilizes Average Embedding Cosine Similarity and Relevance Accuracy metrics for comprehensive model assessment.
- Compares base Llama-2-7b-chat with RAG-based Llama-2-7b-chat, fine-tuned versions, and various prompting techniques.
- Creates an immersive interface using Streamlit for users to interact with the movie recommendation chatbot seamlessly.
- RAG-based model outperforms the base model by a 4% difference in mean embedding cosine similarity and recommends 19/20 relevant movies compared to 14/20 for the base model.
- RAG-based fine-tuned model shows an 8% improvement in mean embedding cosine similarity compared to the fine-tuned base model, with equal relevance accuracy.
- Performance remains consistent for both RAG-based models across different k values (3, 4, 5), maximizing at k=4 for both mean cosine similarity and relevance accuracy.
- Different prompting techniques show varying effects on performance, emphasizing the importance of tailoring prompts for specific contexts.
- Limited computing resources constrained the extent of experimentation and model evaluation.
- Future work involves exploring the impact of different embedding models on context retrieval quality and experimenting with novel AI models beyond Llama-2-7b