Skip to content
View fantasysee's full-sized avatar
🍀
🍀
  • Nanjing University
  • Nanjing, China
  • 07:57 (UTC +02:00)

Highlights

  • Pro

Organizations

@KULeuven-MICAS

Block or report fantasysee

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Official implementation for Yuan & Liu & Zhong et al., KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches. EMNLP Findings 2024

Python 36 2 Updated Oct 5, 2024

A unified simulation platform that combines hardware and software, enabling pre-silicon, full-stack, closed-loop evaluation of your robotic system.

Python 34 4 Updated Sep 27, 2024

A heterogeneous accelerator-centric compute cluster

SystemVerilog 9 9 Updated Oct 5, 2024

Fast and accurate DRAM power and energy estimation tool

C++ 122 47 Updated Oct 1, 2024

GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM

Python 137 11 Updated Jul 12, 2024

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Python 217 21 Updated Aug 27, 2024

A Python package that uses task-based neurons to build neural networks.

Python 133 3 Updated Aug 22, 2024

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

Python 688 53 Updated Jul 24, 2024
C 9 1 Updated Aug 13, 2024

Awesome-LLM: a curated list of Large Language Model

18,037 1,456 Updated Oct 2, 2024

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

C++ 7,901 407 Updated Sep 6, 2024

Deep learning accelerator architectures requiring half the multipliers

Python 260 15 Updated Mar 28, 2024

[ICCV 2023] Q-Diffusion: Quantizing Diffusion Models.

Python 315 21 Updated Mar 21, 2024

The official GitHub page for the survey paper "A Survey of Large Language Models".

Python 10,129 798 Updated Aug 20, 2024

Open-source artifacts and codes of our MICRO'23 paper titled “Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse Multi-DNN Workloads”.

Python 32 Updated Sep 18, 2023

Implementation of "NITI: Training Integer Neural Networks Using Integer-only Arithmetic" on arxiv

C++ 75 14 Updated Jul 26, 2022

Comparison of method "Pruning at initialization prior to training" (Synflow/SNIP/GraSP) in PyTorch

Python 14 1 Updated May 12, 2024
Jupyter Notebook 113 8 Updated Apr 30, 2024

Universal LLM Deployment Engine with ML Compilation

Python 18,814 1,535 Updated Oct 5, 2024

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Python 1,892 151 Updated Mar 27, 2024

Open-Source Posit RISC-V Core with Quire Capability

C++ 41 11 Updated Sep 14, 2023

A framework for fast exploration of the depth-first scheduling space for DNN accelerators

Python 30 8 Updated Feb 8, 2023

这是一款提高ChatGPT的数据安全能力和效率的插件。并且免费共享大量创新功能,如:自动刷新、保持活跃、数据安全、取消审计、克隆对话、言无不尽、净化页面、展示大屏、拦截跟踪、日新月异、明察秋毫等。让我们的AI体验无比安全、顺畅、丝滑、高效、简洁。

JavaScript 14,462 724 Updated Sep 28, 2024

HW Architecture-Mapping Design Space Exploration Framework for Deep Learning Accelerators

C++ 104 36 Updated Oct 3, 2024

A collection of research papers on efficient training of DNNs

68 7 Updated Jul 6, 2022

VSCode插件:自动生成,自动更新VSCode文件头部注释, 自动生成函数注释并支持提取函数参数,支持所有主流语言,文档齐全,使用简单,配置灵活方便,持续维护多年。

JavaScript 5,589 266 Updated Apr 19, 2023

[NeurIPS 2022] A Fast Post-Training Pruning Framework for Transformers

Python 160 25 Updated Feb 28, 2023
Cuda 4 1 Updated Jun 3, 2021
Next