Skip to content

Latest commit

 

History

History
 
 

inference

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Translate

OpenAssistant Inference

Preliminary implementation of the inference engine for OpenAssistant. This is strictly for local development, although you might find limited success for your self-hosting OA plan. There is no warranty that this will not change in the future — in fact, expect it to change.

Development Variant 1 (docker compose)

The services of the inference stack are prefixed with "inference-" in the unified compose descriptor.
Prior to building those, please ensure that you have Docker's new BuildKit backend enabled. See the FAQ for more info.

To build the services, run:

docker compose --profile inference build

Spin up the stack:

docker compose --profile inference up -d

Tail the logs:

docker compose logs -f    \
    inference-server      \
    inference-worker

Note: The compose file contains the bind mounts enabling you to develop on the modules of the inference stack, and the oasst-shared package, without rebuilding.

Note: You can change the model by editing variable MODEL_CONFIG_NAME in the docker-compose.yaml file. Valid model names can be found in model_configs.py.

Note: You can spin up any number of workers by adjusting the number of replicas of the inference-worker service to your liking.

Note: Please wait for the inference-text-generation-server service to output {"message":"Connected"} before starting to chat.

Run the text client and start chatting:

cd text-client
pip install -r requirements.txt
python __main__.py
# You'll soon see a `User:` prompt, where you can type your prompts.

Distributed Testing

We run distributed load tests using the locust Python package.

pip install locust
cd tests/locust
locust

Navigate to http://0.0.0.0:8089/ to view the locust UI.

API Docs

To update the api docs, once the inference server is running run below command to download the inference openapi json into the relevant folder under /docs:

wget localhost:8000/openapi.json -O docs/docs/api/inference-openapi.json

Then make a PR to have the updated docs merged.