feng_shuai fengxiaoshuai

🤣

GPU ,CPU ,Python, C++, CV, NLP

7 followers · 3 following

Baidu
ShangHai.china

Achievements

flash-attention-minimal Public
Forked from tspeterkim/flash-attention-minimal

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda Apache License 2.0 Updated Apr 7, 2024
ompi Public
Forked from open-mpi/ompi

Open MPI main development repository

C Other Updated Feb 29, 2024
Paddle Public
Forked from PaddlePaddle/Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）

C++ Apache License 2.0 Updated Apr 21, 2023
Halide Public
Forked from halide/Halide

a language for fast, portable data-parallel computation

C++ Other Updated Apr 18, 2023
CUDA_decoder_OP Public

Cuda Updated Mar 24, 2023
ppl.kernel.cuda Public
Forked from OpenPPL/ppl.kernel.cuda

C Updated Mar 2, 2023
tvm Public
Forked from apache/tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Python Apache License 2.0 Updated Feb 20, 2023
PaddleFleetX Public
Forked from PaddlePaddle/PaddleFleetX

Paddle Distributed Training Examples. 飞桨分布式训练示例 Resnet Bert GPT MOE DataParallel ModelParallel PipelineParallel HybridParallel AutoParallel Zero Sharding Recompute GradientMerge Offload AMP DGC Loc…

Python Apache License 2.0 Updated Nov 30, 2022
Paddle-Inference-Demo Public
Forked from PaddlePaddle/Paddle-Inference-Demo

C++ Apache License 2.0 Updated Nov 23, 2022
tvm_mlir_learn Public
Forked from BBuf/tvm_mlir_learn

tvm learn

Python Updated Sep 7, 2022
GEMM_WMMA Public
Forked from gty111/GEMM_WMMA

GEMM by WMMA (tensor core)

Cuda Apache License 2.0 Updated Jul 31, 2022
CUDA_gemm Public
Forked from Cjkkkk/CUDA_gemm

A simple high performance CUDA GEMM implementation.

Cuda Updated Jun 16, 2022
vit_attention Public

C++ Apache License 2.0 Updated May 16, 2022
ppl.nn Public
Forked from OpenPPL/ppl.nn

A primitive library for neural network

C++ Apache License 2.0 Updated May 10, 2022
How_to_optimize_in_GPU Public
Forked from Liu-xiandong/How_to_optimize_in_GPU

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda Apache License 2.0 Updated May 4, 2022
PaddleUtils Public
Forked from jiangjiajun/PaddleUtils

Some tools to operate PaddlePaddle model

Python Apache License 2.0 Updated Apr 4, 2022
inference_benchmark Public
Forked from wangye707/inference_benchmark

Python Updated Mar 23, 2022
models Public
Forked from PaddlePaddle/models

Pre-trained and Reproduced Deep Learning Models （『飞桨』官方模型库，包含多种学术前沿和工业场景验证的深度学习模型）

Python Apache License 2.0 Updated Mar 18, 2022
oneflow Public
Forked from Oneflow-Inc/oneflow

OneFlow is a performance-centered and open-source deep learning framework.

C++ Apache License 2.0 Updated Feb 7, 2022
kernel_memory_management Public
Forked from 0voice/kernel_memory_management

总结整理linux内核的内存管理的资料，包含论文，文章，视频，以及应用程序的内存泄露，内存池相关

Updated Dec 29, 2021
HelloGitHub Public
Forked from 521xueweihan/HelloGitHub

分享 GitHub 上有趣、入门级的开源项目。Share interesting, entry-level open source projects on GitHub.

Python Updated Nov 26, 2021
Messy_Test Public
Forked from fengbingchun/Messy_Test

C++/C++11's usage

C++ Updated Oct 27, 2021
PaddleOCR Public
Forked from PaddlePaddle/PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle （practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and…

Python Apache License 2.0 Updated Aug 4, 2021
hellogithub.com Public
Forked from 521xueweihan/hellogithub.com

HelloGitHub.com 网站源码

Python GNU Affero General Public License v3.0 Updated Jun 2, 2021
CodeSamples Public
Forked from CUDA-Tutorial/CodeSamples

Code samples for the CUDA tutorial "CUDA and Applications to Task-based Programming"

Cuda Updated May 23, 2021
CuAssembler Public
Forked from cloudcores/CuAssembler

An unofficial cuda assembler, for all generations of SASS, hopefully ：）

Python MIT License Updated May 17, 2021
docs Public
Forked from PaddlePaddle/docs

Documentations for PaddlePaddle

Shell Apache License 2.0 Updated May 17, 2021
shanghai_house_knowledge Public
Forked from ayuer/shanghai_house_knowledge

2020年11月在上海买房经历总结出来的买房购房做的一些功课分享给大家，技术人帮助技术人，希望对大家有所帮助。

MIT License Updated Dec 13, 2020
500lines Public
Forked from aosabook/500lines

500 Lines or Less

JavaScript Other Updated Sep 4, 2020
CUDA_encoder Public

Cuda Updated Jul 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feng_shuai fengxiaoshuai

Achievements

Achievements

Block or report fengxiaoshuai

flash-attention-minimal Public

ompi Public

Paddle Public

Halide Public

CUDA_decoder_OP Public

ppl.kernel.cuda Public

tvm Public

PaddleFleetX Public

Paddle-Inference-Demo Public

tvm_mlir_learn Public

GEMM_WMMA Public

CUDA_gemm Public

vit_attention Public

ppl.nn Public

How_to_optimize_in_GPU Public

PaddleUtils Public

inference_benchmark Public

models Public

oneflow Public

kernel_memory_management Public

HelloGitHub Public

Messy_Test Public

PaddleOCR Public

hellogithub.com Public

CodeSamples Public

CuAssembler Public

docs Public

shanghai_house_knowledge Public

500lines Public

CUDA_encoder Public