-
flash-attention-minimal Public
Forked from tspeterkim/flash-attention-minimalFlash Attention in ~100 lines of CUDA (forward pass only)
Cuda Apache License 2.0 UpdatedApr 7, 2024 -
ompi Public
Forked from open-mpi/ompiOpen MPI main development repository
C Other UpdatedFeb 29, 2024 -
Paddle Public
Forked from PaddlePaddle/PaddlePArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
C++ Apache License 2.0 UpdatedApr 21, 2023 -
Halide Public
Forked from halide/Halidea language for fast, portable data-parallel computation
C++ Other UpdatedApr 18, 2023 -
-
-
tvm Public
Forked from apache/tvmOpen deep learning compiler stack for cpu, gpu and specialized accelerators
Python Apache License 2.0 UpdatedFeb 20, 2023 -
PaddleFleetX Public
Forked from PaddlePaddle/PaddleFleetXPaddle Distributed Training Examples. 飞桨分布式训练示例 Resnet Bert GPT MOE DataParallel ModelParallel PipelineParallel HybridParallel AutoParallel Zero Sharding Recompute GradientMerge Offload AMP DGC Loc…
Python Apache License 2.0 UpdatedNov 30, 2022 -
Paddle-Inference-Demo Public
Forked from PaddlePaddle/Paddle-Inference-DemoC++ Apache License 2.0 UpdatedNov 23, 2022 -
-
GEMM_WMMA Public
Forked from gty111/GEMM_WMMAGEMM by WMMA (tensor core)
Cuda Apache License 2.0 UpdatedJul 31, 2022 -
CUDA_gemm Public
Forked from Cjkkkk/CUDA_gemmA simple high performance CUDA GEMM implementation.
Cuda UpdatedJun 16, 2022 -
-
ppl.nn Public
Forked from OpenPPL/ppl.nnA primitive library for neural network
C++ Apache License 2.0 UpdatedMay 10, 2022 -
How_to_optimize_in_GPU Public
Forked from Liu-xiandong/How_to_optimize_in_GPUThis is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…
Cuda Apache License 2.0 UpdatedMay 4, 2022 -
PaddleUtils Public
Forked from jiangjiajun/PaddleUtilsSome tools to operate PaddlePaddle model
Python Apache License 2.0 UpdatedApr 4, 2022 -
-
models Public
Forked from PaddlePaddle/modelsPre-trained and Reproduced Deep Learning Models (『飞桨』官方模型库,包含多种学术前沿和工业场景验证的深度学习模型)
Python Apache License 2.0 UpdatedMar 18, 2022 -
oneflow Public
Forked from Oneflow-Inc/oneflowOneFlow is a performance-centered and open-source deep learning framework.
C++ Apache License 2.0 UpdatedFeb 7, 2022 -
kernel_memory_management Public
Forked from 0voice/kernel_memory_management总结整理linux内核的内存管理的资料,包含论文,文章,视频,以及应用程序的内存泄露,内存池相关
UpdatedDec 29, 2021 -
HelloGitHub Public
Forked from 521xueweihan/HelloGitHub分享 GitHub 上有趣、入门级的开源项目。Share interesting, entry-level open source projects on GitHub.
Python UpdatedNov 26, 2021 -
-
PaddleOCR Public
Forked from PaddlePaddle/PaddleOCRAwesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and…
Python Apache License 2.0 UpdatedAug 4, 2021 -
hellogithub.com Public
Forked from 521xueweihan/hellogithub.comHelloGitHub.com 网站源码
Python GNU Affero General Public License v3.0 UpdatedJun 2, 2021 -
CodeSamples Public
Forked from CUDA-Tutorial/CodeSamplesCode samples for the CUDA tutorial "CUDA and Applications to Task-based Programming"
Cuda UpdatedMay 23, 2021 -
CuAssembler Public
Forked from cloudcores/CuAssemblerAn unofficial cuda assembler, for all generations of SASS, hopefully :)
Python MIT License UpdatedMay 17, 2021 -
docs Public
Forked from PaddlePaddle/docsDocumentations for PaddlePaddle
Shell Apache License 2.0 UpdatedMay 17, 2021 -
shanghai_house_knowledge Public
Forked from ayuer/shanghai_house_knowledge2020年11月在上海买房经历总结出来的买房购房做的一些功课分享给大家,技术人帮助技术人,希望对大家有所帮助。
MIT License UpdatedDec 13, 2020 -
500lines Public
Forked from aosabook/500lines500 Lines or Less
JavaScript Other UpdatedSep 4, 2020 -