Skip to content
View yofufufufu's full-sized avatar
  • Jilin University

Highlights

  • Pro

Block or report yofufufufu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.

Starred repositories

137 results for source starred repositories
Clear filter

A tool for bandwidth measurements on NVIDIA GPUs.

C++ 298 29 Updated Jun 14, 2024

Inference code for Llama models

Python 55,923 9,517 Updated Aug 18, 2024

The official Meta Llama 3 GitHub site

Python 26,549 3,001 Updated Aug 12, 2024
Python 587 63 Updated Jun 4, 2024

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Python 11,677 3,452 Updated Oct 9, 2024

Simple samples for TensorRT programming

Python 1,488 338 Updated Sep 5, 2024

Fast and memory-efficient exact attention

Python 13,685 1,256 Updated Oct 8, 2024

📚 C/C++ 技术面试基础知识总结,包括语言、程序库、数据结构、算法、系统、网络、链接装载库等知识及面试经验、招聘、内推等信息。This repository is a summary of the basic knowledge of recruiting job seekers and beginners in the direction of C/C++ technology, in…

C++ 34,637 7,953 Updated Mar 19, 2024

c++后台服务器开发面经或八股总结!(有深度有广度,和仅有概念的总结文章不同!)

1,477 221 Updated Sep 9, 2024

Fast CUDA matrix multiplication from scratch

Cuda 442 61 Updated Dec 28, 2023

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 813 128 Updated Jul 29, 2023

🎉 Modern CUDA Learn Notes with PyTorch: fp32, fp16, bf16, fp8/int8, flash_attn, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.

Cuda 1,247 133 Updated Oct 8, 2024

Repository for HPCGame 1st Problems.

Go 52 6 Updated Feb 6, 2024

Use ChatGPT to summarize the arXiv papers. 全流程加速科研,利用chatgpt进行论文全文总结+专业翻译+润色+审稿+审稿回复

Python 18,341 1,920 Updated Apr 4, 2024

Grok open release

Python 49,468 8,323 Updated Aug 30, 2024

Wiki fo HPC

Python 80 8 Updated Dec 30, 2023

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

C++ 10,621 2,114 Updated Oct 9, 2024

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

C++ 5,098 615 Updated Oct 8, 2024

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 8,365 940 Updated Oct 9, 2024

The C++ Core Guidelines are a set of tried-and-true guidelines, rules, and best practices about coding in C++

CSS 42,605 5,432 Updated Oct 4, 2024

Multi-GPU dynamic scheduler using PGAS style cross-GPU communication

Cuda 27 3 Updated Jul 23, 2023

Official Implementation of "Accel-GNN: High-Performance GPU Accelerator Design for Graph Neural Networks"

Cuda 67 15 Updated Sep 8, 2023

Paddle Graph Learning (PGL) is an efficient and flexible graph learning framework based on PaddlePaddle

Python 1,570 309 Updated Dec 11, 2023

Enterprise graph machine learning framework for billion-scale graphs for ML scientists and data scientists.

Python 363 59 Updated Oct 5, 2024

Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals

Python 11,645 389 Updated Oct 4, 2024
C++ 473 85 Updated Oct 9, 2024

AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

Jupyter Notebook 10,706 1,545 Updated Oct 9, 2024

Artifact for OSDI'23: MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Multi-GPU Platforms.

Cuda 34 4 Updated Mar 17, 2024

stdgpu: Efficient STL-like Data Structures on the GPU

C++ 1,150 81 Updated Sep 19, 2024

🎃 GPU load-balancing library for regular and irregular computations.

C++ 57 4 Updated Jun 14, 2024
Next