Merge branch 'master' of github.com:siboehm/SGEMM_CUDA

vkarihal · Jan 28, 2023 · 33322ce · 33322ce
2 parents f6d3513 + 35cd4e6
commit 33322ce
Show file tree

Hide file tree

Showing 2 changed files with 3 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -2,6 +2,7 @@
 
 Step-by-step optimization of matrix multiplication, implemented in CUDA.
 For an explanation of each kernel, see [siboehm.com/CUDA-MMM](https://siboehm.com/articles/22/CUDA-MMM).
+This repo is inspired by [wangzyon/NVIDIA_SGEMM_PRACTICE](https://github.com/wangzyon/NVIDIA_SGEMM_PRACTICE).
 
 ## Overview
 
@@ -33,4 +34,4 @@ GFLOPs at matrix size 4092x4092:
 1. `mkdir build && cd build && cmake .. -GNinja && ninja`
 1. `./sgemm <kernel number>`
 
-For profiling, download [NVIDIA Nsight Compute](https://developer.nvidia.com/nsight-compute).
+For profiling, download [NVIDIA Nsight Compute](https://developer.nvidia.com/nsight-compute).
diff --git a/src/kernels.cuh b/src/kernels.cuh
@@ -6,4 +6,4 @@
 #include "kernels/4_kernel_1D_warptiling.cuh"
 #include "kernels/5_kernel_2D_warptiling.cuh"
 #include "kernels/6_kernel_vectorize.cuh"
-#include "kernels/7_kernel_tensorcores.cuh"
+#include "kernels/7_kernel_resolve_bank_conflicts.cuh"