LLMGateway

This is a front facing gateway for multiple LLMs. The idea, is that this server acts as a middle man between multiple different LLMs providing OpenAI compatible APIS and the user. The gateway keeps track of users token usage (the current state only tracks completion tokens), and handles access to the LLM servers (i.e. keeps the secrets that allow using them).

Features

Self service Auth via SAML
Self service checkout for key generation
Admin management via REST API

TODO

Here are a few features which are currently on our TODO list:

Admin Front End UI
More fine grained Access key control (i.e. controlling what models can be accessed with a key)
Proper Usage logging (including prompt tokens, currently restricted to completion tokens)

Requirements and dependencies

Kubernetes
- Certbot Letsencrypt plugin
- Set up secrets for Admin key and LLM API key
- MongoDB and Redis deployed on the cluster.
Security
- The current authenticaion scheme for the user API is based on session cookies and SAML authentication.
  - This means, you need an existing IdP which has the gateway setup as a service provider.
  - You will need to update the auth saml router endpoints to conform with what the kind of access you want to allow
- The current assumption is that any key can be used with any model and that there is no use restriction.
  - If you want to implement this kind of restriction, you should add another dependency on the llm endpoints.
Python dependencies:
- General:
  - fastapi
  - gunicorn
  - uvicorn
  - redis-py
  - pymongo
  - schedule
  - httpx
  - sse-starlette
  - itsdangerous (for session managment)
  - python-multipart
  - python-jose
- For SAML:
  - python3-saml

Architecture

Mongo DB

The Mongo database employed in this gateway is used for storing the logging information along with user data.

The `apikeys` collection:

{
    "user" : str,  # User, this key belongs to
    "active": boolean, # whether the key is active
    "key": str, # the actual key
    "name": str # name given to the key
}

Likely future fields: authorization : [ str ] to indicate which models a key is for.

The `logs` collection

{
    "tokencount": int,  # This is the completion tokens
    "isprompt": boolean, # Whether this is for prompt or completion
    "model": str, # which model was used for this usage
    "source": str, # key or user who caused this usage
    "sourcetype": str # Whether the source is a "user" or an "apikey"
    "timestamp": datetime,  # Current timestamp in UTC
}

The `user` collection

{
    "username": str,  # The identifier of the user - provided by the IdP
    "keys": [ str ], # The set of keys belonging to this user
}

Likely future fields:

"isAdmin" : boolean indicator whether the user is an admin, default, false

Redis

The redis database is mainly used for fast retrieval of authentication keys, and should thus be kept in sync with the mongo db keys.

Logging / Usage

The way usage is currently logged and retrieved is potentially rather slow. If it becomes necessary to implement rate limits / daily or similar restrictions, it might be necessary, to implement a more efficient usage check methodology, than the retrieval from MongoDB, as that DB can become pretty crowded. For daily max usage, an option could be to add usage to the redis db. It might also be necessary to add additional "costs" to each model in the future.

Run gateway locally

You will need to set the LLM_DEFAULT_URL environment variable (including any port specification) for the container to point to the location of your LLM server.

You will need at least one LLM model running on your local machine. This model needs to accept requests on LLM_DEFAULT_URL/<model_id>/v1/..

The API of the model server needs to be compatible with the API provided by llama-cpp-python[server]

In the future, LLM endpoints will also have to provide an additional /extras/tokenize/count endpoint, which calculates prompt tokens based on either a single input string, or a full ChatCompletionRequest.

The docker-compose.yml included in this repo is an example on how to test locally. You will need to set up the keycloak installation for this to work and point the gateway saml authentication to that keycloak service.

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
app		app
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
docker_databases.yml		docker_databases.yml
entrypoint.sh		entrypoint.sh
environment.yml		environment.yml
environment_dev.yml		environment_dev.yml
k8s_deployment_databases.yaml		k8s_deployment_databases.yaml
k8s_deployment_gateway.yaml		k8s_deployment_gateway.yaml
k8s_gateway_ingress.yaml		k8s_gateway_ingress.yaml
k8s_services.yaml		k8s_services.yaml
k8s_volumes.yaml		k8s_volumes.yaml
localEnv		localEnv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLMGateway

Features

TODO

Requirements and dependencies

Architecture

Mongo DB

The `apikeys` collection:

The `logs` collection

The `user` collection

Redis

Logging / Usage

Run gateway locally

About

Releases

Packages

Contributors 2

Languages

AaltoRSE/LLMGateway

Folders and files

Latest commit

History

Repository files navigation

LLMGateway

Features

TODO

Requirements and dependencies

Architecture

Mongo DB

The apikeys collection:

The logs collection

The user collection

Redis

Logging / Usage

Run gateway locally

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

The `apikeys` collection:

The `logs` collection

The `user` collection

Packages