Stars
This repository is an implementation of quantizing and converting the Llama3-8B-Instruct model weights and deploying it on Android for on-device inference.
Proxy that allows you to use ollama as a copilot like Github copilot
A developer reference project for creating Retrieval Augmented Generation (RAG) chatbots on Windows using TensorRT-LLM
Universal LLM Deployment Engine with ML Compilation
LostRuins / koboldcpp
Forked from ggerganov/llama.cppRun GGUF models easily with a KoboldAI UI. One File. Zero Install.