gemm
Here are 65 public repositories matching this topic...
Some common CUDA kernel implementations (Not the fastest).
-
Updated
Jun 28, 2024 - Cuda
A lightweight matrix computation software library aim for MCU or embedded system
-
Updated
Feb 24, 2022 - C
Development of deep learning inference code by OpenCL kerenl function.
-
Updated
Jun 1, 2022 - C++
Course Programming on new Architecture-1 (GPU), autumn 2021
-
Updated
Dec 5, 2021 - C++
Fast SGEMM emulation on Tensor Cores
-
Updated
Nov 20, 2023 - Cuda
Low Precision Arithmetic for Convolutional Neural Network Inference
-
Updated
Oct 29, 2017 - C++
XNOR-Net with binary conv2d kernels with XNOR GEMM op, support both CPU and GPU.
-
Updated
Oct 25, 2022 - C
My attempt of making a GEMM kernel...
-
Updated
Jun 16, 2023 - Cuda
Manually optimize the GEMM (GEneral Matrix Multiply) operation. There is a long way to go.
-
Updated
Aug 22, 2021 - C++
The fastest Tropical number matrix multiplication on GPU
-
Updated
Feb 25, 2024 - Julia
Improve this page
Add a description, image, and links to the gemm topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the gemm topic, visit your repo's landing page and select "manage topics."