Stars
A modern model graph visualizer and debugger
A framework for few-shot evaluation of language models.
Omnitrace: Application Profiling, Tracing, and Analysis
Advanced Profiling and Analytics for AMD Hardware
The ROCdebug-agent is a library that can be loaded by ROCm Platform Runtime to provide some debugging functionality.
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
Next generation BLAS implementation for ROCm platform
AMD lab notes with code examples to demonstrate use of AMD GPUs
This is the Personality Core for GLaDOS, the first steps towards a real-life implementation of the AI from the Portal series by Valve.
Continuous builder and binary build scripts for pytorch
This repository hosts code that supports the testing infrastructure for the PyTorch organization. For example, this repo hosts the logic to track disabled tests and slow tests, as well as our conti…
NVIDIA Linux open GPU with P2P support
⭐ CLI tool for sorting dependents repo by stars
Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
A Native-PyTorch Library for LLM Fine-tuning
An innovative library for efficient LLM inference via low-bit quantization
Manage scalable open LLM inference endpoints in Slurm clusters
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference
FlashInfer: Kernel Library for LLM Serving