Lists (13)
Sort Name ascending (A-Z)
Audio-Visual corrspondence
Audio-Visual event detection
Audio-Visual generation
Audio-Visual-models
Audio-Visual ZSL
Models for Audio-Visual ZSLclassfication models
Cross modal retrieval
Datasets
Stars
Code for Discriminative Sounding Objects Localization (NeurIPS 2020)
A dataset for Audio-Visual Sound Event Detection in Movies
Implementation for Label Relation Graphs Enhanced Hierarchical Residual Network for Hierarchical Multi-Granularity Classification
CVPR2022 - Deep Hierarchical Semantic Segmentation - A structured, pixel-wise description of visual scenes in terms of the class hierarchy.
Localizing Visual Sounds the Hard Way
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)
[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning
The repo for "Class-aware Sounding Objects Localization", TPAMI 2021.
Codebase for ECCV18 "The Sound of Pixels"
The code repo for ICASSP 2023 Paper "MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning"
This repository contains the code for our CVPR 2022 paper on "Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and Language"
Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)
Official PyTorch implementation of the TIP paper "Generating Visually Aligned Sound from Videos" and the corresponding Visually Aligned Sound (VAS) dataset.
Implementation of "Audio Retrieval with Natural Language Queries: A Benchmark Study".
This repository contains the code for our ECCV 2022 paper "Temporal and cross-modal attention for audio-visual zero-shot learning"
A modern yet simple multi-platform video cutter and joiner.
The swiss army knife of lossless video/audio editing
The unofficial implementation of paper, "Objects that Sound", from ECCV 2018.
Audio-Visual Event Localization in Unconstrained Videos, ECCV 2018
Awesome video understanding toolkits based on PaddlePaddle. It supports video data annotation tools, lightweight RGB and skeleton based action recognition model, practical applications for video ta…
Source code for "Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors." (Spotlight at the BMVC 2022)
A hands-on introduction to video technology: image, video, codec (av1, vp9, h265) and more (ffmpeg encoding). Translations: 🇺🇸 🇨🇳 🇯🇵 🇮🇹 🇰🇷 🇷🇺 🇧🇷 🇪🇸