📚 C/C++ 技术面试基础知识总结，包括语言、程序库、数据结构、算法、系统、网络、链接装载库等知识及面试经验、招聘、内推等信息。This repository is a summary of the basic knowledge of recruiting job seekers and beginners in the direction of C/C++ technology, in…

C++ 34,637 7,953 Updated Mar 19, 2024

guaguaupup / cpp_interview

c++后台服务器开发面经或八股总结！(有深度有广度，和仅有概念的总结文章不同！)

1,477 221 Updated Sep 9, 2024

siboehm / SGEMM_CUDA

Fast CUDA matrix multiplication from scratch

Cuda 442 61 Updated Dec 28, 2023

Liu-xiandong / How_to_optimize_in_GPU

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 813 128 Updated Jul 29, 2023

DefTruth / CUDA-Learn-Notes

🎉 Modern CUDA Learn Notes with PyTorch: fp32, fp16, bf16, fp8/int8, flash_attn, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.

Cuda 1,247 133 Updated Oct 8, 2024

lcpu-club / hpcgame_1st_problems

Repository for HPCGame 1st Problems.

Go 52 6 Updated Feb 6, 2024

kaixindelele / ChatPaper

Use ChatGPT to summarize the arXiv papers. 全流程加速科研，利用chatgpt进行论文全文总结+专业翻译+润色+审稿+审稿回复

Python 18,341 1,920 Updated Apr 4, 2024

xai-org / grok-1

Grok open release

Python 49,468 8,323 Updated Aug 30, 2024

lcpu-club / hpc-wiki

Wiki fo HPC

Python 80 8 Updated Dec 30, 2023

NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

C++ 10,621 2,114 Updated Oct 9, 2024

NVIDIA / DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

C++ 5,098 615 Updated Oct 8, 2024

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 8,365 940 Updated Oct 9, 2024

isocpp / CppCoreGuidelines

The C++ Core Guidelines are a set of tried-and-true guidelines, rules, and best practices about coding in C++

CSS 42,605 5,432 Updated Oct 4, 2024

owensgroup / ATOS

Multi-GPU dynamic scheduler using PGAS style cross-GPU communication

Cuda 27 3 Updated Jul 23, 2023

xiexi51 / ICCAD-Accel-GCN

Official Implementation of "Accel-GNN: High-Performance GPU Accelerator Design for Graph Neural Networks"

Cuda 67 15 Updated Sep 8, 2023

PaddlePaddle / PGL

Paddle Graph Learning (PGL) is an efficient and flexible graph learning framework based on PaddlePaddle

Python 1,570 309 Updated Dec 11, 2023

awslabs / graphstorm

Enterprise graph machine learning framework for billion-scale graphs for ML scientists and data scientists.

Python 363 59 Updated Oct 5, 2024

plasma-umass / scalene

Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals

Python 11,645 389 Updated Oct 4, 2024

NVIDIA / cuCollections

C++ 473 85 Updated Oct 9, 2024

chenzomi12 / AISystem

AISystem 主要是指AI系统，包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

Jupyter Notebook 10,706 1,545 Updated Oct 9, 2024

YukeWang96 / MGG_OSDI23

Artifact for OSDI'23: MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Multi-GPU Platforms.

Cuda 34 4 Updated Mar 17, 2024

stotko / stdgpu

stdgpu: Efficient STL-like Data Structures on the GPU

C++ 1,150 81 Updated Sep 19, 2024

gunrock / loops

🎃 GPU load-balancing library for regular and irregular computations.

C++ 57 4 Updated Jun 14, 2024

Weikai Tang yofufufufu

Highlights

Lists (3)

GNN

Learning

八股

Starred repositories

Deep learning

Data visualization

Java

Python

Linux

Data structures

C++

Algorithm

C