Skip to content
View wiluen's full-sized avatar

Block or report wiluen

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.
Showing results

Fast inference from large lauguage models via speculative decoding

Python 530 51 Updated Aug 22, 2024

Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of GPT-Fast, a simple, PyTorch-native generation codebase.

Python 82 8 Updated Aug 9, 2024

This repo contains the source code for: Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs

29 Updated Aug 14, 2024

Deferred Continuous Batching in Resource-Efficient Large Language Model Serving (EuroMLSys 2024)

Python 11 1 Updated May 28, 2024

Transformer Explained Visually: Learn How LLM Transformer Models Work with Interactive Visualization

JavaScript 2,680 233 Updated Sep 30, 2024

A banchmark list for evaluation of large language models.

60 1 Updated Jul 8, 2024
Python 2 Updated Sep 2, 2024
Python 1 Updated Sep 11, 2024

AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

Jupyter Notebook 10,643 1,537 Updated Sep 29, 2024

Benchmarking the serving capabilities of vLLM

Python 16 4 Updated Aug 20, 2024

AcadHomepage: A Modern and Responsive Academic Personal Homepage

SCSS 1,346 2,532 Updated Oct 4, 2024

A deployment, monitoring and autoscaling service towards serverless LLM serving.

Python 153 25 Updated Sep 28, 2024

UniTime: A Language-Empowered Unified Model for Cross-Domain Time Series Forecasting (WWW 2024)

Python 66 5 Updated Feb 24, 2024

Deployment scripts & config for Sock Shop

Python 3,629 2,812 Updated Dec 5, 2023

Training and serving large-scale neural networks with auto parallelization.

Python 3,052 353 Updated Dec 9, 2023

[NeurIPS 2021] [T-PAMI] Global Filter Networks for Image Classification

Jupyter Notebook 436 40 Updated Jun 12, 2023

Time series forecasting especially in LSTF compare,include Informer, Autoformer, Reformer, Pyraformer, FEDformer, Transformer, MTGNN, LSTNet, Graph WaveNet

Python 91 13 Updated Sep 30, 2022

Share or Not Share? Towards the Practicability of Deep Models for Unsupervised Anomaly Detection in Modern Online Systems (ISSRE'22)

Python 8 Updated Feb 16, 2023

Pytorch code for Google's Temporal Fusion Transformer

Python 79 24 Updated May 2, 2022

Borg cluster traces from Google

TeX 874 187 Updated Jun 26, 2024

Serverless optimizations

Python 50 17 Updated Feb 25, 2024

This is a collection of our research on efficient AI, covering hardware-aware NAS and model compression.

Python 73 7 Updated Sep 17, 2024

This repository consists of useful tools or guides for system software development or anything interesting.

Python 10 2 Updated Sep 20, 2024

Dataset containing runtimes and estimated costs for various workloads across different cloud providers and configuration settings.

Jupyter Notebook 9 4 Updated Jun 10, 2022
JavaScript 13 4 Updated Oct 13, 2021

Repo containing data and code for serverless paper

Jupyter Notebook 8 2 Updated Jun 23, 2023
17 2 Updated Dec 11, 2023

PPIO workload prediction framework code

Python 14 2 Updated Jul 22, 2024

Multi-Agent Resource Optimization (MARO) platform is an instance of Reinforcement Learning as a Service (RaaS) for real-world resource optimization problems.

Python 846 152 Updated Feb 23, 2024
Next