wolf1981

wolf1981

2 followers · 40 following

Block or Report

Block or report wolf1981

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

fp6_llm Public
Forked from usyd-fsalab/fp6_llm

An efficient GPU support for LLM inference with 6-bit quantization (FP6).

Cuda Apache License 2.0 Updated Mar 5, 2024
exllamav2 Public
Forked from turboderp/exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs

Python MIT License Updated Sep 15, 2023
GPTQ-for-LLaMa Public
Forked from qwopqwop200/GPTQ-for-LLaMa

4 bits quantization of LLaMA using GPTQ

Python Apache License 2.0 Updated Jun 23, 2023
RPTQ4LLM Public
Forked from hahnyuan/RPTQ4LLM

Reorder-based post-training quantization for large language model

Python MIT License Updated May 17, 2023
llama.onnx Public
Forked from tpoisonooo/llama.onnx

llama/alpaca onnx models, quantization and testcase

Python 1 GNU General Public License v3.0 Updated Apr 19, 2023
self-instruct Public
Forked from yizhongw/self-instruct

Aligning pretrained language models with instruction data generated by themselves.

Python Apache License 2.0 Updated Mar 27, 2023
LargeScale Public
Forked from wangguojim/LargeScale

Python Other Updated Feb 11, 2023
FasterTransformer Public
Forked from THUDM/FasterTransformer

Transformer related optimization, including BERT, GPT

C++ Apache License 2.0 Updated Feb 10, 2023
iree Public
Forked from iree-org/iree

👻

C++ Apache License 2.0 Updated Mar 1, 2022
I-BERT Public
Forked from kssteven418/I-BERT

[ICML'21] I-BERT: Integer-only BERT Quantization

Python MIT License Updated May 8, 2021
onnxruntime Public
Forked from microsoft/onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

C++ MIT License Updated Sep 3, 2020
TNN Public
Forked from Tencent/TNN

TNN：由腾讯优图实验室打造，移动端高性能、轻量级推理框架，同时拥有跨平台、高性能、模型压缩、代码裁剪等众多突出优势。TNN框架在原有Rapidnet、ncnn框架的基础上进一步加强了移动端设备的支持以及性能优化，同时也借鉴了业界主流开源框架高性能和良好拓展性的优点。目前TNN已经在手Q、微视、P图等应用中落地，欢迎大家参与协同共建，促进TNN推理框架进一步完善。

C++ Other Updated Jun 10, 2020
folly Public
Forked from facebook/folly

An open-source C++ library developed and used at Facebook.

C++ Apache License 2.0 Updated Jun 5, 2020
MegEngine Public
Forked from MegEngine/MegEngine

MegEngine 是一个快速、可拓展、易于使用且支持自动求导的数值计算框架

C++ Other Updated Mar 24, 2020
turingas Public
Forked from daadaada/turingas

Assembler for NVIDIA Volta and Turing GPUs

Python MIT License Updated Jan 16, 2020
cuGemmProf Public
Forked from jeng1220/cuGemmProf

A simple tool to profile performance of multiple combinations of GEMM of cuBLAS

C++ MIT License Updated Jan 14, 2020
TensorRT Public
Forked from NVIDIA/TensorRT

TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.

C++ Apache License 2.0 Updated Dec 3, 2019
flexible-gemm Public
Forked from XiuYuLi/flexible-gemm

flexible-gemm conv of deepcore

C Updated Dec 2, 2019
glow Public
Forked from pytorch/glow

Compiler for Neural Network hardware accelerators

C++ Apache License 2.0 Updated Nov 22, 2019
cpufp Public
Forked from pigirons/cpufp

A CPU tool for benchmarking the peak of floating points

C GNU General Public License v3.0 Updated Oct 8, 2019
plaidml Public
Forked from plaidml/plaidml

PlaidML is a framework for making deep learning work everywhere.

C++ Apache License 2.0 Updated Sep 25, 2019
netron Public
Forked from lutzroeder/netron

Visualizer for neural network, deep learning and machine learning models

JavaScript MIT License Updated Sep 8, 2019
mlir Public
Forked from tensorflow/mlir

"Multi-Level Intermediate Representation" Compiler Infrastructure

C++ Apache License 2.0 Updated Jul 30, 2019
sling Public
Forked from google/sling

SLING - A natural language frame semantics parser

C++ Apache License 2.0 Updated Jul 17, 2019
kill-the-bits Public
Forked from facebookresearch/kill-the-bits

Code for: "And the bit goes down: Revisiting the quantization of neural networks"

Python Other Updated Jul 15, 2019
MNN Public
Forked from alibaba/MNN

MNN is a lightweight deep neural network inference engine.

C++ Updated May 6, 2019
morph-net Public
Forked from google-research/morph-net

Fast & Simple Resource-Constrained Learning of Deep Network Structure

Python Apache License 2.0 Updated Apr 19, 2019
LPCNet Public
Forked from xiph/LPCNet

Efficient neural speech synthesis

C BSD 3-Clause "New" or "Revised" License Updated Apr 14, 2019
catamount Public
Forked from baidu-research/catamount

Catamount is a compute graph analysis tool to load, construct, and modify deep learning models and to symbolically analyze their compute requirements

Python Apache License 2.0 Updated Apr 10, 2019
QNNPACK Public
Forked from pytorch/QNNPACK

Quantized Neural Network PACKage - mobile-optimized implementation of quantized neural network operators

C Other Updated Mar 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wolf1981

Block or report wolf1981

fp6_llm Public

exllamav2 Public

GPTQ-for-LLaMa Public

RPTQ4LLM Public

llama.onnx Public

self-instruct Public

LargeScale Public

FasterTransformer Public

iree Public

I-BERT Public

onnxruntime Public

TNN Public

folly Public

MegEngine Public

turingas Public

cuGemmProf Public

TensorRT Public

flexible-gemm Public

glow Public

cpufp Public

plaidml Public

netron Public

mlir Public

sling Public

kill-the-bits Public

MNN Public

morph-net Public

LPCNet Public

catamount Public

QNNPACK Public