Document Processor and RAG System

This project implements a Streamlit-based web application that processes various document types (PDF, TXT, and web pages) and creates a Retrieval-Augmented Generation (RAG) system for question answering.

Features

Support for multiple document types:
- PDF files
- Text files
- Web pages (via URL)
Flexible embedding model selection:
- Ollama
- FastEmbeddings
- HuggingFace
Customizable local LLM model selection
Vector store creation for efficient document retrieval
RAG-based question answering system
Streamlit-based user interface for easy interaction
Comprehensive logging system

Requirements

Python 3.7+
Streamlit
LangChain
PyPDF2
BeautifulSoup4
Requests
Ollama (for local LLM serving)
Other dependencies (see requirements.txt)

Setup

Clone this repository:

git clone https://github.com/yourusername/document-processor-rag-system.git
cd document-processor-rag-system

Create a virtual environment and activate it:

python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

Install the required dependencies:
```
pip install -r requirements.txt
```
Install Ollama by following the instructions at Ollama's official website.
Start the Ollama server:
```
ollama serve
```
Pull the necessary models. For example, to use the 'mistral' model:
```
ollama pull mistral
```

Usage

Ensure the Ollama server is running (step 5 in Setup).
Start the Streamlit app:
```
streamlit run app.py
```
Open your web browser and navigate to the URL provided by Streamlit (usually http://localhost:8501).
Use the sidebar to configure:
- Log level
- Embedding model
- Local LLM model (enter the name of an Ollama-compatible model, e.g., 'mistral')
Upload PDF or TXT files using the file uploader.
(Optional) Enter a website URL to process.
Click "Process Files/URLs and Create Vector Store" to process the documents and create the RAG system.
Once processing is complete, use the "Ask a Question" section to query the system about the processed documents.

Model Constraints

When selecting a local LLM model, keep in mind the following constraints:

The model must be compatible with Ollama.
Ensure you have sufficient system resources (RAM, GPU) to run the chosen model.
Some larger models may require more processing time, affecting the responsiveness of the application.

To use a specific model:

Pull the model using Ollama (e.g., ollama pull mistral)
Enter the model name in the "Local LLM Model" field in the Streamlit sidebar.

Project Structure

app.py: Main Streamlit application file
uploaded_files/: Directory for temporarily storing uploaded files
logs/: Directory for storing application logs
vector_database/: Directory for storing the Chroma vector database

Customization

To add support for additional document types, modify the process_file function in app.py.
To change the embedding models, update the respective sections in the sidebar configuration.
To use different LLM models, ensure they are pulled via Ollama and enter their names in the sidebar.

Logging

The application generates detailed logs in the logs/ directory. Check these logs for debugging and monitoring purposes.

Troubleshooting

If you encounter issues with Ollama, ensure the server is running (ollama serve).
If a model is not found, make sure you've pulled it using ollama pull [model-name].
For performance issues, try using a smaller or more efficient model.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details."# Local-RAG-System"

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
app		app
tests		tests
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Document Processor and RAG System

Features

Requirements

Setup

Usage

Model Constraints

Project Structure

Customization

Logging

Troubleshooting

Contributing

License

About

Releases

Packages

Languages

siku788/Local-RAG-System

Folders and files

Latest commit

History

Repository files navigation

Document Processor and RAG System

Features

Requirements

Setup

Usage

Model Constraints

Project Structure

Customization

Logging

Troubleshooting

Contributing

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages