Skip to content

Containerized solution for AI speech-to-text that can run locally

License

Notifications You must be signed in to change notification settings

ClinicianFOCUS/speech2text-container

Repository files navigation

speech2text-container

Containerized solution for AI speech-to-text that can run locally

Prerequisites

Docker and Docker-compose installed on your system.

Build and run

  1. Install
git clone https://github.com/ClinicianFOCUS/speech2text-container.git
  1. Navigate into the directory
cd speech2text-container
  • Recommended Step: Set a API key in the .env file. this will be the key used to access the API. If this step is skipped the container wil generate a random key for you upon each start.
  1. Build and run the docker images using docker-compose
docker-compose up -d --build

Usage

Send a Request: Send audio data to the exposed endpoint on http://localhost:2224/whisperaudio

The following environment variables can be customized to control the behavior of the speech-container service:

  • WHISPER_MODEL: The Whisper model to use (default: medium).
  • WHISPER_PORT: The port to expose the service (default: 2224).
  • WHISPER_HOST: The host to bind the service (default: 0.0.0.0).
  • UVICORN_WORKERS: Number of Uvicorn workers (default: 1).

Windows Example

$env:WHISPER_MODEL="large"; $env:WHISPER_PORT="2224"; $env:WHISPER_HOST="127.0.0.1"; docker-compose up -d --build

Linux Example:

NOTE: Not tested in a linux environment

WHISPER_MODEL=large WHISPER_PORT=2224 WHISPER_HOST=127.0.0.1 docker-compose up -d --build

This will start the transcription service with the large Whisper model on port 2224, accessible only from 127.0.0.1.

Authentication

On server start up you will receive a api key in the console. This key changes on each start up of the software.

You can set your own API key in the .env. This will be your authentication key when launched in the container.

Example request

   curl -X POST "http://localhost:2224/whisperaudio" \
        -H "Authorization: Bearer <api_key>" \
        -F "file=@/path/to/audiofile.wav"

Example response

{
  "text": "Transcribed text here"
}

License

This project is licensed under the AGPL-3.0 License. See the LICENSE file for details.

About

Containerized solution for AI speech-to-text that can run locally

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published