gemm

Star

Here are 65 public repositories matching this topic...

PhuNH / hpc-aa

Star

High Performance Computing - Algorithms and Applications Course in WS18-19 at TUM

cuda fermi gemm

Updated Feb 3, 2019
C++

KarhouTam / cuda-kernels

Star

Some common CUDA kernel implementations (Not the fastest).

cuda-kernels gemm softmax relu cuda-programming layernorm cuda-learning

Updated Jun 28, 2024
Cuda

jhson989 / fast-conv

Star

Fast Convoluion Implementation via CUDA

cuda convolution gemm

Updated Apr 26, 2022
Cuda

rollingbug / LinMatrix

Star

A lightweight matrix computation software library aim for MCU or embedded system

microcontroller linear-algebra embedded-systems matrix-multiplication eigenvectors numerical-methods mcu eigenvalues gemm language-c lu-decomposition matrix-library qr-decomposition matrix-computations axpy

Updated Feb 24, 2022
C

junyoung1992 / OpenCL-GEMM

Star

GEMM Optimization

acceleration opencl parallelism gpgpu gemm

Updated Jan 12, 2022
C

yester31 / OpenCL_EX

Star

Development of deep learning inference code by OpenCL kerenl function.

opencl parallel-computing convolution deeplearning gemm

Updated Jun 1, 2022
C++

a-sidorova / gpu_opencl_cource

Star

Course Programming on new Architecture-1 (GPU), autumn 2021

gpu opencl jacobi gemm heterogeneous-computing

Updated Dec 5, 2021
C++

enp1s0 / cuMpSGEMM

Star

Fast SGEMM emulation on Tensor Cores

gpu cuda gemm half-precision mixed-precision tensorcore tensorcores fp32

Updated Nov 20, 2023
Cuda

dev0x13 / gemm-benchmark-2023

Star

Benchmarks for some modern (2023) high-performance floating-point GEMM implementations compared to Mojo language

benchmark mojo gemm

Updated Jun 18, 2024
Mojo

KaiserKlayton / lpa_cnn

Star

Low Precision Arithmetic for Convolutional Neural Network Inference

benchmarking caffe deep-learning image-recognition convolutional-neural-networks 8-bit gemm

Updated Oct 29, 2017
C++

andreytkachenko / yarblas

Star

Yet another rust BLAS

rust machine-learning math rust-lang blas gemm

Updated Feb 13, 2020
Rust

pminhtam / xnor_conv_pytorch_extension

Star

XNOR-Net with binary conv2d kernels with XNOR GEMM op, support both CPU and GPU.

cpp cuda pytorch xnor-net gemm binary-convolutions xnor-convolutions binary-neural-networks binary-op pytorch-extension

Updated Oct 25, 2022
C

digital-nomad-cheng / matmul_cuda_kernel_tvm

Star

Generate optimized MatMul cuda kernel automatically using tvm auto schedule.

hpc gpu cuda gemm tvm gemm-optimization matmul

Updated Feb 25, 2023
Jupyter Notebook

ZhangGe6 / how-to-optimize-playground

Star

High-performance computing (HPC) demos since I was a freshmen.

cuda x86 gemm

Updated Jun 15, 2022
C

JoeruCodes / CUDA-GEMM-kernel

Star

My attempt of making a GEMM kernel...

parallel-computing cuda cuda-kernels gemm gemm-optimization cuda-programming gemms

Updated Jun 16, 2023
Cuda

zixuanweeei / gemm-opt

Star

Manually optimize the GEMM (GEneral Matrix Multiply) operation. There is a long way to go.

cpu cpp gemm gemm-optimization

Updated Aug 22, 2021
C++

cyrusmsk / gemm_apple

Star

GEMM on Apple Silicon

benchmark deep-learning gemm applesilicon m1-mac

Updated Dec 25, 2023
Python

DongqiShen / iLLM

Star

Implementing LLM from scratch. (Developing...)

arm64 gemm llm-inference

Updated Nov 15, 2023
C

TensorBFS / CuTropicalGEMM.jl

Star

The fastest Tropical number matrix multiplication on GPU

cuda gemm tropical-algebra

Updated Feb 25, 2024
Julia

BenQuickDeNN / CUDA-GEMM

Star

CUDA version GEMM

cpp cuda gemm

Updated Mar 5, 2020
C++

Improve this page

Add a description, image, and links to the gemm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the gemm topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gemm

Here are 65 public repositories matching this topic...

PhuNH / hpc-aa

KarhouTam / cuda-kernels

jhson989 / fast-conv

rollingbug / LinMatrix

junyoung1992 / OpenCL-GEMM

yester31 / OpenCL_EX

a-sidorova / gpu_opencl_cource

enp1s0 / cuMpSGEMM

dev0x13 / gemm-benchmark-2023

KaiserKlayton / lpa_cnn

andreytkachenko / yarblas

pminhtam / xnor_conv_pytorch_extension

digital-nomad-cheng / matmul_cuda_kernel_tvm

ZhangGe6 / how-to-optimize-playground

JoeruCodes / CUDA-GEMM-kernel

zixuanweeei / gemm-opt

cyrusmsk / gemm_apple

DongqiShen / iLLM

TensorBFS / CuTropicalGEMM.jl

BenQuickDeNN / CUDA-GEMM

Improve this page

Add this topic to your repo