Block or Report
Block or report wolf1981
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuse-
fp6_llm Public
Forked from usyd-fsalab/fp6_llmAn efficient GPU support for LLM inference with 6-bit quantization (FP6).
Cuda Apache License 2.0 UpdatedMar 5, 2024 -
exllamav2 Public
Forked from turboderp/exllamav2A fast inference library for running LLMs locally on modern consumer-class GPUs
Python MIT License UpdatedSep 15, 2023 -
GPTQ-for-LLaMa Public
Forked from qwopqwop200/GPTQ-for-LLaMa4 bits quantization of LLaMA using GPTQ
Python Apache License 2.0 UpdatedJun 23, 2023 -
RPTQ4LLM Public
Forked from hahnyuan/RPTQ4LLMReorder-based post-training quantization for large language model
Python MIT License UpdatedMay 17, 2023 -
llama.onnx Public
Forked from tpoisonooo/llama.onnxllama/alpaca onnx models, quantization and testcase
-
self-instruct Public
Forked from yizhongw/self-instructAligning pretrained language models with instruction data generated by themselves.
Python Apache License 2.0 UpdatedMar 27, 2023 -
-
FasterTransformer Public
Forked from THUDM/FasterTransformerTransformer related optimization, including BERT, GPT
C++ Apache License 2.0 UpdatedFeb 10, 2023 -
-
I-BERT Public
Forked from kssteven418/I-BERT[ICML'21] I-BERT: Integer-only BERT Quantization
Python MIT License UpdatedMay 8, 2021 -
onnxruntime Public
Forked from microsoft/onnxruntimeONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
C++ MIT License UpdatedSep 3, 2020 -
TNN Public
Forked from Tencent/TNNTNN:由腾讯优图实验室打造,移动端高性能、轻量级推理框架,同时拥有跨平台、高性能、模型压缩、代码裁剪等众多突出优势。TNN框架在原有Rapidnet、ncnn框架的基础上进一步加强了移动端设备的支持以及性能优化,同时也借鉴了业界主流开源框架高性能和良好拓展性的优点。目前TNN已经在手Q、微视、P图等应用中落地,欢迎大家参与协同共建,促进TNN推理框架进一步完善。
C++ Other UpdatedJun 10, 2020 -
folly Public
Forked from facebook/follyAn open-source C++ library developed and used at Facebook.
C++ Apache License 2.0 UpdatedJun 5, 2020 -
MegEngine Public
Forked from MegEngine/MegEngineMegEngine 是一个快速、可拓展、易于使用且支持自动求导的数值计算框架
C++ Other UpdatedMar 24, 2020 -
turingas Public
Forked from daadaada/turingasAssembler for NVIDIA Volta and Turing GPUs
Python MIT License UpdatedJan 16, 2020 -
cuGemmProf Public
Forked from jeng1220/cuGemmProfA simple tool to profile performance of multiple combinations of GEMM of cuBLAS
C++ MIT License UpdatedJan 14, 2020 -
TensorRT Public
Forked from NVIDIA/TensorRTTensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.
C++ Apache License 2.0 UpdatedDec 3, 2019 -
flexible-gemm Public
Forked from XiuYuLi/flexible-gemmflexible-gemm conv of deepcore
C UpdatedDec 2, 2019 -
glow Public
Forked from pytorch/glowCompiler for Neural Network hardware accelerators
C++ Apache License 2.0 UpdatedNov 22, 2019 -
cpufp Public
Forked from pigirons/cpufpA CPU tool for benchmarking the peak of floating points
C GNU General Public License v3.0 UpdatedOct 8, 2019 -
plaidml Public
Forked from plaidml/plaidmlPlaidML is a framework for making deep learning work everywhere.
C++ Apache License 2.0 UpdatedSep 25, 2019 -
netron Public
Forked from lutzroeder/netronVisualizer for neural network, deep learning and machine learning models
JavaScript MIT License UpdatedSep 8, 2019 -
mlir Public
Forked from tensorflow/mlir"Multi-Level Intermediate Representation" Compiler Infrastructure
C++ Apache License 2.0 UpdatedJul 30, 2019 -
sling Public
Forked from google/slingSLING - A natural language frame semantics parser
C++ Apache License 2.0 UpdatedJul 17, 2019 -
kill-the-bits Public
Forked from facebookresearch/kill-the-bitsCode for: "And the bit goes down: Revisiting the quantization of neural networks"
Python Other UpdatedJul 15, 2019 -
MNN Public
Forked from alibaba/MNNMNN is a lightweight deep neural network inference engine.
C++ UpdatedMay 6, 2019 -
morph-net Public
Forked from google-research/morph-netFast & Simple Resource-Constrained Learning of Deep Network Structure
Python Apache License 2.0 UpdatedApr 19, 2019 -
LPCNet Public
Forked from xiph/LPCNetEfficient neural speech synthesis
C BSD 3-Clause "New" or "Revised" License UpdatedApr 14, 2019 -
catamount Public
Forked from baidu-research/catamountCatamount is a compute graph analysis tool to load, construct, and modify deep learning models and to symbolically analyze their compute requirements
Python Apache License 2.0 UpdatedApr 10, 2019 -
QNNPACK Public
Forked from pytorch/QNNPACKQuantized Neural Network PACKage - mobile-optimized implementation of quantized neural network operators
C Other UpdatedMar 26, 2019