This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on

Python 380 32 Updated Jun 9, 2024

THU-MIG / yolov10

YOLOv10: Real-Time End-to-End Object Detection

Python 8,552 747 Updated Jul 18, 2024

open-webui / open-webui

User-friendly WebUI for LLMs (Formerly Ollama WebUI)

Svelte 33,079 3,671 Updated Jul 27, 2024

EthicalML / awesome-production-machine-learning

A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning

16,806 2,206 Updated Jul 23, 2024

mobiusml / hqq

Official implementation of Half-Quadratic Quantization (HQQ)

Python 581 55 Updated Jul 21, 2024

OpenNMT / CTranslate2

Fast inference engine for Transformer models

C++ 3,095 274 Updated Jul 26, 2024

KindXiaoming / pykan

Kolmogorov Arnold Networks

Jupyter Notebook 13,921 1,253 Updated Jul 26, 2024

rust-lang / rustlings

🦀 Small exercises to get you used to reading and writing Rust code!

Rust 51,588 9,921 Updated Jul 25, 2024

myshell-ai / MeloTTS

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.

Python 4,126 508 Updated Jul 6, 2024

soumik-kanad / diff2lip

Python 265 30 Updated Jul 5, 2024

jaeyeonkim99 / EnCLAP

Official Implementation of EnCLAP (ICASSP 2024)

Python 88 4 Updated Jun 2, 2024

shashikg / WhisperS2T

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

Jupyter Notebook 254 21 Updated Jul 9, 2024

snakers4 / silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Python 3,490 362 Updated Jul 21, 2024

python-streamz / streamz

Real-time stream processing for python

Python 1,230 146 Updated Jun 18, 2024

choomegan / audio-vad-splitter

Split audio based on Pyannote's VAD

Python 3 Updated Apr 3, 2024

jasonppy / VoiceCraft

Zero-Shot Speech Editing and Text-to-Speech in the Wild

Jupyter Notebook 7,270 714 Updated Jun 24, 2024

alessandroragano / nomad

NOMAD is a fully unsupervised non-matching reference audio quality metric

Python 22 1 Updated May 27, 2024

soumimaiti / speechlmscore_tool

Python 25 2 Updated Jan 25, 2023

tiangolo / full-stack-fastapi-template

Full stack, modern web application template. Using FastAPI, React, SQLModel, PostgreSQL, Docker, GitHub Actions, automatic HTTPS and more.

TypeScript 24,890 4,217 Updated Jul 25, 2024

ssine / pptx2md

a pptx to markdown converter

Python 467 73 Updated May 3, 2024

s3prl / s3prl

Self-Supervised Speech Pre-training and Representation Learning Toolkit

Python 2,169 479 Updated Jun 18, 2024

choomegan / asr-conformer-inference

Python 1 Updated Feb 23, 2024

architsharma97 / dpo-rlaif

Jupyter Notebook 82 7 Updated Jun 27, 2024

YUCHEN005 / RobustGER

Code for paper "Large Language Models are Efficient Learners of Noise-Robust Speech Recognition"

Python 113 2 Updated May 8, 2024

Takaaki-Saeki / DiscreteSpeechMetrics

Reference-aware automatic speech evaluation toolkit

Python 80 5 Updated Feb 22, 2024

AIGCDesignGroup / ReplaceAnything

2,320 95 Updated May 17, 2024

espnet / espnet_model_zoo

ESPnet Model Zoo

Python 243 41 Updated Jul 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dan test-dan-run

Achievements