Containerized solution for AI speech-to-text that can run locally
Docker and Docker-compose installed on your system.
- Install
git clone https://github.com/ClinicianFOCUS/speech2text-container.git
- Navigate into the directory
cd speech2text-container
- Recommended Step: Set a API key in the .env file. this will be the key used to access the API. If this step is skipped the container wil generate a random key for you upon each start.
- Build and run the docker images using docker-compose
docker-compose up -d --build
Send a Request: Send audio data to the exposed endpoint on http://localhost:2224/whisperaudio
The following environment variables can be customized to control the behavior of the speech-container
service:
WHISPER_MODEL
: The Whisper model to use (default:medium
).WHISPER_PORT
: The port to expose the service (default:2224
).WHISPER_HOST
: The host to bind the service (default:0.0.0.0
).UVICORN_WORKERS
: Number of Uvicorn workers (default:1
).
Windows Example
$env:WHISPER_MODEL="large"; $env:WHISPER_PORT="2224"; $env:WHISPER_HOST="127.0.0.1"; docker-compose up -d --build
Linux Example:
NOTE: Not tested in a linux environment
WHISPER_MODEL=large WHISPER_PORT=2224 WHISPER_HOST=127.0.0.1 docker-compose up -d --build
This will start the transcription service with the large
Whisper model on port 2224
, accessible only from 127.0.0.1
.
On server start up you will receive a api key in the console. This key changes on each start up of the software.
You can set your own API key in the .env. This will be your authentication key when launched in the container.
curl -X POST "http://localhost:2224/whisperaudio" \
-H "Authorization: Bearer <api_key>" \
-F "file=@/path/to/audiofile.wav"
{
"text": "Transcribed text here"
}
This project is licensed under the AGPL-3.0 License. See the LICENSE file for details.