Block or Report
Block or report senwang86
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abusePaper Source Code
[NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"
[WWW'2024] "RLMRec: Representation Learning with Large Language Models for Recommendation"
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
EMNLP 23 - Integrating Whisper Encoder to LLaMA Decoder for Generative ASR Error Correction
[ICCV 2023] ProPainter: Improving Propagation and Transformer for Video Inpainting
Official Pytorch Implementation for "TokenFlow: Consistent Diffusion Features for Consistent Video Editing" presenting "TokenFlow" (ICLR 2024)
Official codes of DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior
🧑🏫 60 Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gan…
This repository contains the official implementation of the research paper, "FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization" ICCV 2023
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
[ICCV 2023] StableVideo: Text-driven Consistency-aware Diffusion Video Editing
The official project website of "KernelWarehouse: Rethinking the Design of Dynamic Convolution" (KW for short, accepted to ICML 2024)
[ICCV 2023] BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion
Official code for ICCV 2023 Paper: AlignDet: Aligning Pre-training and Fine-tuning in Object Detection.
Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning
[ICCV 2023] Spectrum-guided Multi-granularity Referring Video Object Segmentation.
[CVPR2024] DisCo: Referring Human Dance Generation in Real World
Official repository of FLatten Transformer (ICCV2023)
RepViT: Revisiting Mobile CNN From ViT Perspective [CVPR 2024] and RepViT-SAM: Towards Real-Time Segmenting Anything
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
[NeurIPS2023] Code release for "Hierarchical Open-vocabulary Universal Image Segmentation"
A Residual Network Design with less than 5 million trainable parameters achieving an accuracy of 96.04% on CIFAR-10.
Official implementation of "3HAN: A Deep Neural Network for Fake News Detection" (ICONIP 2017)
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
Official Implementation of InstructZero; the first framework to optimize bad prompts of ChatGPT(API LLMs) and finally obtain good prompts!
QLoRA: Efficient Finetuning of Quantized LLMs
Official PyTorch codes for the paper: "ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation"