Skip to content
View wolf1981's full-sized avatar
Block or Report

Block or report wolf1981

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
  • fp6_llm Public

    Forked from usyd-fsalab/fp6_llm

    An efficient GPU support for LLM inference with 6-bit quantization (FP6).

    Cuda Apache License 2.0 Updated Mar 5, 2024
  • exllamav2 Public

    Forked from turboderp/exllamav2

    A fast inference library for running LLMs locally on modern consumer-class GPUs

    Python MIT License Updated Sep 15, 2023
  • 4 bits quantization of LLaMA using GPTQ

    Python Apache License 2.0 Updated Jun 23, 2023
  • RPTQ4LLM Public

    Forked from hahnyuan/RPTQ4LLM

    Reorder-based post-training quantization for large language model

    Python MIT License Updated May 17, 2023
  • llama/alpaca onnx models, quantization and testcase

    Python 1 GNU General Public License v3.0 Updated Apr 19, 2023
  • Aligning pretrained language models with instruction data generated by themselves.

    Python Apache License 2.0 Updated Mar 27, 2023
  • Python Other Updated Feb 11, 2023
  • Transformer related optimization, including BERT, GPT

    C++ Apache License 2.0 Updated Feb 10, 2023
  • iree Public

    Forked from iree-org/iree

    👻

    C++ Apache License 2.0 Updated Mar 1, 2022
  • I-BERT Public

    Forked from kssteven418/I-BERT

    [ICML'21] I-BERT: Integer-only BERT Quantization

    Python MIT License Updated May 8, 2021
  • ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

    C++ MIT License Updated Sep 3, 2020
  • TNN Public

    Forked from Tencent/TNN

    TNN:由腾讯优图实验室打造,移动端高性能、轻量级推理框架,同时拥有跨平台、高性能、模型压缩、代码裁剪等众多突出优势。TNN框架在原有Rapidnet、ncnn框架的基础上进一步加强了移动端设备的支持以及性能优化,同时也借鉴了业界主流开源框架高性能和良好拓展性的优点。目前TNN已经在手Q、微视、P图等应用中落地,欢迎大家参与协同共建,促进TNN推理框架进一步完善。

    C++ Other Updated Jun 10, 2020
  • folly Public

    Forked from facebook/folly

    An open-source C++ library developed and used at Facebook.

    C++ Apache License 2.0 Updated Jun 5, 2020
  • MegEngine Public

    Forked from MegEngine/MegEngine

    MegEngine 是一个快速、可拓展、易于使用且支持自动求导的数值计算框架

    C++ Other Updated Mar 24, 2020
  • turingas Public

    Forked from daadaada/turingas

    Assembler for NVIDIA Volta and Turing GPUs

    Python MIT License Updated Jan 16, 2020
  • cuGemmProf Public

    Forked from jeng1220/cuGemmProf

    A simple tool to profile performance of multiple combinations of GEMM of cuBLAS

    C++ MIT License Updated Jan 14, 2020
  • TensorRT Public

    Forked from NVIDIA/TensorRT

    TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.

    C++ Apache License 2.0 Updated Dec 3, 2019
  • flexible-gemm conv of deepcore

    C Updated Dec 2, 2019
  • glow Public

    Forked from pytorch/glow

    Compiler for Neural Network hardware accelerators

    C++ Apache License 2.0 Updated Nov 22, 2019
  • cpufp Public

    Forked from pigirons/cpufp

    A CPU tool for benchmarking the peak of floating points

    C GNU General Public License v3.0 Updated Oct 8, 2019
  • plaidml Public

    Forked from plaidml/plaidml

    PlaidML is a framework for making deep learning work everywhere.

    C++ Apache License 2.0 Updated Sep 25, 2019
  • netron Public

    Forked from lutzroeder/netron

    Visualizer for neural network, deep learning and machine learning models

    JavaScript MIT License Updated Sep 8, 2019
  • mlir Public

    Forked from tensorflow/mlir

    "Multi-Level Intermediate Representation" Compiler Infrastructure

    C++ Apache License 2.0 Updated Jul 30, 2019
  • sling Public

    Forked from google/sling

    SLING - A natural language frame semantics parser

    C++ Apache License 2.0 Updated Jul 17, 2019
  • Code for: "And the bit goes down: Revisiting the quantization of neural networks"

    Python Other Updated Jul 15, 2019
  • MNN Public

    Forked from alibaba/MNN

    MNN is a lightweight deep neural network inference engine.

    C++ Updated May 6, 2019
  • Fast & Simple Resource-Constrained Learning of Deep Network Structure

    Python Apache License 2.0 Updated Apr 19, 2019
  • LPCNet Public

    Forked from xiph/LPCNet

    Efficient neural speech synthesis

    C BSD 3-Clause "New" or "Revised" License Updated Apr 14, 2019
  • Catamount is a compute graph analysis tool to load, construct, and modify deep learning models and to symbolically analyze their compute requirements

    Python Apache License 2.0 Updated Apr 10, 2019
  • QNNPACK Public

    Forked from pytorch/QNNPACK

    Quantized Neural Network PACKage - mobile-optimized implementation of quantized neural network operators

    C Other Updated Mar 26, 2019