-
Shanghai Jiao Tong University
Block or Report
Block or report DicardoX
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
An interference-aware scheduler for fine-grained GPU sharing
SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.
NumPy aware dynamic Python compiler using LLVM
A fast communication-overlapping library for tensor parallelism on GPUs.
Tutorial for building a custom CUDA function for Pytorch
Material for cuda-mode lectures
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.
Disaggregated serving system for Large Language Models (LLMs).
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
Official repository for the paper DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Parameter Efficient Transfer Learning with Diff Pruning
Implementation of paper "Towards a Unified View of Parameter-Efficient Transfer Learning" (ICLR 2022)
Automatic Schedule Exploration and Optimization Framework for Tensor Computations
MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)
PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…