Lists (8)
Sort Name ascending (A-Z)
Stars
Model components of the Llama Stack APIs
Implementation of Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting
Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
Utilities intended for use with Llama models.
Development repository for the Triton language and compiler
Open-Sora: Democratizing Efficient Video Production for All
The official code for paper "parallel speculative decoding with adaptive draft length."
Explorations into some recent techniques surrounding speculative decoding
📰 Must-read papers and blogs on Speculative Decoding ⚡️
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
这是一个用于显示当前网速、CPU及内存利用率的桌面悬浮窗软件,并支持任务栏显示,支持更换皮肤。
scalable and robust tree-based speculative decoding algorithm
Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**
Summarize existing representative LLMs text datasets.
VideoSys: An easy and efficient system for video generation
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters
Neural Networks: Zero to Hero
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs