Skip to content

Commit

Permalink
Merge branch 'master' of github.com:siboehm/SGEMM_CUDA
Browse files Browse the repository at this point in the history
  • Loading branch information
Simon Boehm committed Jan 28, 2023
2 parents f6d3513 + 35cd4e6 commit 33322ce
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 2 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

Step-by-step optimization of matrix multiplication, implemented in CUDA.
For an explanation of each kernel, see [siboehm.com/CUDA-MMM](https://siboehm.com/articles/22/CUDA-MMM).
This repo is inspired by [wangzyon/NVIDIA_SGEMM_PRACTICE](https://github.com/wangzyon/NVIDIA_SGEMM_PRACTICE).

## Overview

Expand Down Expand Up @@ -33,4 +34,4 @@ GFLOPs at matrix size 4092x4092:
1. `mkdir build && cd build && cmake .. -GNinja && ninja`
1. `./sgemm <kernel number>`

For profiling, download [NVIDIA Nsight Compute](https://developer.nvidia.com/nsight-compute).
For profiling, download [NVIDIA Nsight Compute](https://developer.nvidia.com/nsight-compute).
2 changes: 1 addition & 1 deletion src/kernels.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@
#include "kernels/4_kernel_1D_warptiling.cuh"
#include "kernels/5_kernel_2D_warptiling.cuh"
#include "kernels/6_kernel_vectorize.cuh"
#include "kernels/7_kernel_tensorcores.cuh"
#include "kernels/7_kernel_resolve_bank_conflicts.cuh"

0 comments on commit 33322ce

Please sign in to comment.