Skip to content
View DicardoX's full-sized avatar
  • Shanghai Jiao Tong University
Block or Report

Block or report DicardoX

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

An interference-aware scheduler for fine-grained GPU sharing

Python 77 11 Updated May 12, 2024

SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.

Python 6,259 429 Updated Jul 14, 2024

NumPy aware dynamic Python compiler using LLVM

Python 9,641 1,112 Updated Jul 12, 2024

PyTorch bindings for CUTLASS grouped GEMM.

Cuda 32 27 Updated Jul 9, 2024

CUDA Templates for Linear Algebra Subroutines

C++ 4,907 845 Updated Jul 15, 2024

C++ extensions in PyTorch

Python 964 202 Updated Jun 21, 2024

A fast communication-overlapping library for tensor parallelism on GPUs.

C++ 85 8 Updated Jul 15, 2024

Tutorial for building a custom CUDA function for Pytorch

Python 503 54 Updated Jan 25, 2019
Python 20 3 Updated May 11, 2024

Material for cuda-mode lectures

Jupyter Notebook 1,795 174 Updated Jun 13, 2024

An experimental parallel training platform

40 10 Updated Mar 25, 2024

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 2,084 178 Updated Jul 15, 2024

SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.

Python 2,836 182 Updated Jul 15, 2024

Disaggregated serving system for Large Language Models (LLMs).

Jupyter Notebook 180 14 Updated Jun 14, 2024

[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable

Python 64 1 Updated Jun 30, 2024

Official repository for the paper DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines

Python 11 1 Updated Dec 8, 2023

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

Python 13,898 1,808 Updated Jul 3, 2024

Efficient AI Inference & Serving

Python 448 25 Updated Jan 8, 2024

Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)

Python 2,554 261 Updated Jun 2, 2024

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 2,132 154 Updated Jul 15, 2024

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Python 129,305 25,633 Updated Jul 15, 2024

Parameter Efficient Transfer Learning with Diff Pruning

Python 70 8 Updated Feb 3, 2021

Implementation of paper "Towards a Unified View of Parameter-Efficient Transfer Learning" (ICLR 2022)

Python 498 44 Updated Mar 24, 2022

Automatic Schedule Exploration and Optimization Framework for Tensor Computations

Python 172 29 Updated Apr 25, 2022

MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)

Python 31 2 Updated May 29, 2024

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT

Python 2,426 337 Updated Jul 15, 2024

Compiler for Dynamic Neural Networks

Python 40 2 Updated Nov 13, 2023

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…

Python 1,643 261 Updated Jul 15, 2024
Next