Skip to content
View Ning-Lorraine's full-sized avatar

Block or report Ning-Lorraine

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 353 30 Updated Nov 28, 2023

[INTERSPEECH 2022] This dataset is designed for multi-modal speaker diarization and lip-speech synchronization in the wild.

HTML 33 1 Updated Jan 24, 2024

webrtc audio processing

C++ 373 136 Updated May 10, 2020

Python interface to the WebRTC Voice Activity Detector

C 2,011 404 Updated Jul 4, 2024

A enterprise-grade Voice Activity Detector from modelscope and funasr.

Python 49 5 Updated Apr 26, 2023

This project uses a variety of advanced voiceprint recognition models such as EcapaTdnn, ResNetSE, ERes2Net, CAM++, etc. It is not excluded that more models will be supported in the future. At the …

Python 748 119 Updated Sep 9, 2024

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

Python 1,066 93 Updated Aug 18, 2024

Code for the TASLP paper "PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation".

Python 137 16 Updated Jul 13, 2023

The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"

Python 342 62 Updated Aug 16, 2024

Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deplo…

C 805 127 Updated Jul 18, 2024

基于Flask Web的中文自动语音识别演示系统,包含语音识别、语音合成、声纹识别之说话人识别。

CSS 155 28 Updated Mar 31, 2024

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Jupyter Notebook 5,876 752 Updated Aug 19, 2024

Video to Text Translation + VTT Subtitle Generation + WebService

CSS 8 Updated Feb 11, 2024

This repo includes the official implementations of "Fine-tune the pretrained ATST model for sound event detection".

Jupyter Notebook 80 11 Updated Aug 17, 2024

哔哩下载姬downkyi,哔哩哔哩网站视频下载工具,支持批量下载,支持8K、HDR、杜比视界,提供工具箱(音视频提取、去水印等)。

C# 20,598 2,267 Updated Aug 14, 2024

The Pytorch implementation of sound classification supports EcapaTdnn, PANNS, TDNN, Res2Net, ResNetSE and other models, as well as a variety of preprocessing methods.

Python 380 79 Updated Sep 4, 2024

Python script that slices audio with silence detection

Python 747 265 Updated Jun 8, 2024

语音识别API,分实时语音和长语音离线上传识别,支持中英文等多达100个国家的语言实时转写和同声传译

Java 54 5 Updated Jul 11, 2023

🍭 Wow, such a lovely HTML5 danmaku video player

JavaScript 15,384 2,403 Updated Mar 24, 2024

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

Python 4,154 1,218 Updated Aug 14, 2024

深度学习脚手架

Jupyter Notebook 9 1 Updated Aug 12, 2023

A Unified Semi-Supervised Learning Codebase (NeurIPS'22)

Python 1,306 172 Updated Aug 24, 2024

SpeechIO Leaderboard: a large, robust, comprehensive, benchmarking platform for Automatic Speech Recognition.

Python 427 60 Updated Sep 10, 2024

Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications

Python 5,800 1,190 Updated Mar 31, 2024

ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab

Python 1,967 292 Updated Mar 19, 2024

ChatWeb can crawl web pages, read PDF, DOCX, TXT, and extract the main content, then answer your questions based on the content, or summarize the key points.

Python 874 136 Updated Jun 25, 2024
Next