A simple, high-quality voice conversion tool focused on ease of use and performance.
-
Updated
Sep 27, 2024 - Python
A simple, high-quality voice conversion tool focused on ease of use and performance.
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
A desktop application that uses AI to translate voice between languages in real time, while preserving the speaker's tone and emotion.
If you've ever had the wish to talk to your AI Waifu using quality characters and voices for character voicing, then I suggest Soul of Waifu. Don't miss the opportunity to touch your dream!
Code for NeurIPS 2023 paper "DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation".
Code for the INTERSPEECH 2023 paper "Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models"
Chatter Box is an android app that is capable of Voice, Text, Image Text Translation, and end-to-end chat translation.
💬 "Realtime" voice transcription and cloning using ElevenLabs's API.
A user-friendly interface for ElevenLabs' API with added audio transcription capability.
Speech to text to speech using Elevenlabs
A flask web-page hosting a speech to speech translation demo
Speech-to-Speech translation dataset for German and English (text and speech quadruplets).
This repository contains the code for a speech to speech translation system created from scratch for digits translation from English to Tamil
simple speech to speech chatbot to talk with
A comparison of E2E and Cascading S2ST systems on the CVSS-C Spanish to English dataset (CommonVoice 4.0)
GPT powered rubber duck debugger as CS50 2023 final project.
Conversational speech chatbot utilizing OpenAI's GPTs and Microsoft Azure's Speech Services
CtrlSpeak is a voice assistant activated with [Control]+Q, listening and responding only when you want.
End-to-End AI Voice Assistant pipeline with Whisper for Speech-to-Text, Hugging Face LLM for response generation, and Edge-TTS for Text-to-Speech. Features include Voice Activity Detection (VAD), tunable parameters for pitch, gender, and speed, and real-time response with latency optimization.
Audio-to-Audio using microsoft/speecht5_vc from HuggingFace
Add a description, image, and links to the speech-to-speech topic page so that developers can more easily learn about it.
To associate your repository with the speech-to-speech topic, visit your repo's landing page and select "manage topics."