Lists (1)
Sort Name ascending (A-Z)
Stars
Fast inference from large lauguage models via speculative decoding
Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of GPT-Fast, a simple, PyTorch-native generation codebase.
This repo contains the source code for: Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs
Deferred Continuous Batching in Resource-Efficient Large Language Model Serving (EuroMLSys 2024)
Transformer Explained Visually: Learn How LLM Transformer Models Work with Interactive Visualization
A banchmark list for evaluation of large language models.
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
Benchmarking the serving capabilities of vLLM
AcadHomepage: A Modern and Responsive Academic Personal Homepage
A deployment, monitoring and autoscaling service towards serverless LLM serving.
UniTime: A Language-Empowered Unified Model for Cross-Domain Time Series Forecasting (WWW 2024)
Deployment scripts & config for Sock Shop
Training and serving large-scale neural networks with auto parallelization.
[NeurIPS 2021] [T-PAMI] Global Filter Networks for Image Classification
Time series forecasting especially in LSTF compare,include Informer, Autoformer, Reformer, Pyraformer, FEDformer, Transformer, MTGNN, LSTNet, Graph WaveNet
Share or Not Share? Towards the Practicability of Deep Models for Unsupervised Anomaly Detection in Modern Online Systems (ISSRE'22)
Pytorch code for Google's Temporal Fusion Transformer
This is a collection of our research on efficient AI, covering hardware-aware NAS and model compression.
This repository consists of useful tools or guides for system software development or anything interesting.
Dataset containing runtimes and estimated costs for various workloads across different cloud providers and configuration settings.
Repo containing data and code for serverless paper
Multi-Agent Resource Optimization (MARO) platform is an instance of Reinforcement Learning as a Service (RaaS) for real-world resource optimization problems.