DicardoX

Dicardo Xue DicardoX

Ph.D. Candidate@sjtu-epcc. Previous intern @microsoft, Shanghai. UG@CSE, SJTU. Research Interest: ML System, DL Scheduling, LLM Training/Finetuning.

48 followers · 16 following

Shanghai Jiao Tong University

Achievements

Block or Report

Block or report DicardoX

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Stars

eth-easl / orion

An interference-aware scheduler for fine-grained GPU sharing

Python 77 11 Updated May 12, 2024

skypilot-org / skypilot

SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.

Python 6,259 429 Updated Jul 14, 2024

numba / numba

NumPy aware dynamic Python compiler using LLVM

Python 9,641 1,112 Updated Jul 12, 2024

tgale96 / grouped_gemm

PyTorch bindings for CUTLASS grouped GEMM.

Cuda 32 27 Updated Jul 9, 2024

NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines

C++ 4,907 845 Updated Jul 15, 2024

pytorch / extension-cpp

C++ extensions in PyTorch

Python 964 202 Updated Jun 21, 2024

bytedance / flux

A fast communication-overlapping library for tensor parallelism on GPUs.

C++ 85 8 Updated Jul 15, 2024

chrischoy / pytorch-custom-cuda-tutorial

Tutorial for building a custom CUDA function for Pytorch

Python 503 54 Updated Jan 25, 2019

microsoft / chunk-attention

Python 20 3 Updated May 11, 2024

cuda-mode / lectures

Material for cuda-mode lectures

Jupyter Notebook 1,795 174 Updated Jun 13, 2024

microsoft / SuperScaler

An experimental parallel training platform

40 10 Updated Mar 25, 2024

alibaba / alibaba-lingjun-dataset-2023

9 Updated Jun 25, 2024

ModelTC / lightllm

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 2,084 178 Updated Jul 15, 2024

sgl-project / sglang

SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.

Python 2,836 182 Updated Jul 15, 2024

LLMServe / DistServe

Disaggregated serving system for Large Language Models (LLMs).

Jupyter Notebook 180 14 Updated Jun 14, 2024

microsoft / ParrotServe

[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable

Python 64 1 Updated Jun 30, 2024

awslabs / optimizing-multitask-training-through-dynamic-pipelines

Official repository for the paper DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines

Python 11 1 Updated Dec 8, 2023

microsoft / nni

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

Python 13,898 1,808 Updated Jul 3, 2024

hpcaitech / SwiftInfer

Efficient AI Inference & Serving

Python 448 25 Updated Jan 8, 2024

dvlab-research / LongLoRA

Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)

Python 2,554 261 Updated Jun 2, 2024

mit-han-lab / llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 2,132 154 Updated Jul 15, 2024

huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Python 129,305 25,633 Updated Jul 15, 2024

dguo98 / DiffPruning

Parameter Efficient Transfer Learning with Diff Pruning

Python 70 8 Updated Feb 3, 2021

jxhe / unify-parameter-efficient-tuning

Implementation of paper "Towards a Unified View of Parameter-Efficient Transfer Learning" (ICLR 2022)

Python 498 44 Updated Mar 24, 2022

pku-liang / FlexTensor

Automatic Schedule Exploration and Optimization Framework for Tensor Computations

Python 172 29 Updated Apr 25, 2022

pku-liang / MAGIS

MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)

Python 31 2 Updated May 29, 2024

pytorch / TensorRT

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT

Python 2,426 337 Updated Jul 15, 2024

Raphael-Hao / brainstorm

Compiler for Dynamic Neural Networks

Python 40 2 Updated Nov 13, 2023

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…

Python 1,643 261 Updated Jul 15, 2024

microsoft / nnscaler

16 3 Updated Jun 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dicardo Xue DicardoX

Achievements

Achievements

Block or report DicardoX

Stars

eth-easl / orion

skypilot-org / skypilot

numba / numba

tgale96 / grouped_gemm

NVIDIA / cutlass

pytorch / extension-cpp

bytedance / flux

chrischoy / pytorch-custom-cuda-tutorial

microsoft / chunk-attention

cuda-mode / lectures

microsoft / SuperScaler

alibaba / alibaba-lingjun-dataset-2023

ModelTC / lightllm

sgl-project / sglang

LLMServe / DistServe

microsoft / ParrotServe

awslabs / optimizing-multitask-training-through-dynamic-pipelines

microsoft / nni

hpcaitech / SwiftInfer

dvlab-research / LongLoRA

mit-han-lab / llm-awq

huggingface / transformers

dguo98 / DiffPruning

jxhe / unify-parameter-efficient-tuning

pku-liang / FlexTensor

pku-liang / MAGIS

pytorch / TensorRT

Raphael-Hao / brainstorm

NVIDIA / TransformerEngine

microsoft / nnscaler