Skip to content
View yaowenxu's full-sized avatar
👋
Back to the Future.
👋
Back to the Future.

Organizations

@Game-Emulators

Block or report yaowenxu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.
Showing results

Official implementation of SPACE

Python 10 Updated May 19, 2024

Multi-Candidate Speculative Decoding

Python 29 5 Updated Apr 22, 2024

Model components of the Llama Stack APIs

Python 2,586 256 Updated Sep 30, 2024

Implementation of Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting

Python 41 5 Updated Jun 26, 2024

Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding

JavaScript 60 4 Updated Sep 28, 2024

Utilities intended for use with Llama models.

Python 4,237 750 Updated Sep 25, 2024

Development repository for the Triton language and compiler

C++ 12,895 1,561 Updated Sep 30, 2024

Open-Sora: Democratizing Efficient Video Production for All

Python 21,744 2,102 Updated Aug 9, 2024

The official code for paper "parallel speculative decoding with adaptive draft length."

Python 17 Updated Aug 23, 2024

Cascade Speculative Drafting

Python 25 2 Updated Apr 2, 2024

Explorations into some recent techniques surrounding speculative decoding

Python 193 15 Updated Oct 9, 2023

📰 Must-read papers and blogs on Speculative Decoding ⚡️

371 14 Updated Sep 26, 2024

Cool Papers - Immersive Paper Discovery

HTML 361 5 Updated Sep 11, 2024

Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)

Python 166 16 Updated May 29, 2024

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Python 7,699 451 Updated May 3, 2024

[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Python 209 12 Updated Aug 31, 2024

A monitor of resources

C++ 19,922 617 Updated Sep 24, 2024

An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.

Python 4,670 146 Updated Sep 11, 2024

这是一个用于显示当前网速、CPU及内存利用率的桌面悬浮窗软件,并支持任务栏显示,支持更换皮肤。

C++ 34,563 3,245 Updated Mar 16, 2024

scalable and robust tree-based speculative decoding algorithm

Python 304 31 Updated Aug 13, 2024

Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**

Jupyter Notebook 131 8 Updated May 24, 2024

Summarize existing representative LLMs text datasets.

838 83 Updated Sep 4, 2024

VideoSys: An easy and efficient system for video generation

Python 1,669 112 Updated Sep 30, 2024

A collection of useful .gitignore templates

161,451 83,121 Updated Sep 9, 2024

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters

Python 538 46 Updated Sep 28, 2024

Neural Networks: Zero to Hero

Jupyter Notebook 11,607 1,452 Updated Aug 18, 2024
Python 62 2 Updated Aug 30, 2024

Serving multiple LoRA finetuned LLM as one

Python 960 45 Updated May 8, 2024

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Python 2,126 139 Updated Sep 30, 2024
Jupyter Notebook 450 22 Updated Aug 23, 2024
Next