Repository for the setup of a local LLM containerto support other activities and tools we are developing. This setup allows you to run a local instance of a Language Model (LLM) with GPU support and access it via HTTPS using Caddy reverse proxy.
- Docker and Docker Compose installed on your system.
- NVIDIA Docker runtime for GPU support. Installation guide here.
-
Clone the Repository:
git clone https://github.com/ClinicianFOCUS/local-llm-container.git cd local-llm-container
-
Download the LLM model you want to use and place it in the
/models
folder.
The following environment variables can be set to configure the services:
MODEL_NAME
: The path to the LLM model file. Default is/models/gemma-2-2b-it
.LLM_CONTAINER_PORT
: The port on which the LLM container will be accessible. Default is3334
.
You can set these variables using the CLI:
Windows:
$env:MODEL_NAME='/models/you_models_folder'
Linux:
export MODEL_NAME /models/you_models_folder
Use Docker Compose to start the services:
docker-compose up -d
Access the LLM API through the Caddy reverse proxy:
- OpenAI API:
https://localhost:3334/v1/
- Docs:
https://localhost:3334/docs/
- OpenAI API Docs:
https://platform.openai.com/docs/api-reference/introduction