Stars
- All languages
- Assembly
- C
- C#
- C++
- CSS
- Clojure
- CoffeeScript
- Cuda
- Dart
- Elixir
- Erlang
- F#
- Go
- HTML
- Java
- JavaScript
- Jinja
- Julia
- Jupyter Notebook
- Kotlin
- LLVM
- Lua
- MATLAB
- MDX
- Makefile
- OCaml
- Objective-C
- Objective-C++
- PHP
- PLpgSQL
- Perl
- PostScript
- Python
- Ruby
- Rust
- SCSS
- Scala
- Scheme
- Shell
- Svelte
- Swift
- SystemVerilog
- TeX
- Thrift
- TypeScript
- Vue
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
Model components of the Llama Stack APIs
A visual and transparent alternative to open-source ChatGPT O1
A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.
🕵️♂️ TUI for sniffing network traffic using eBPF on Linux
MLLM for On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
📄 A curated list of awesome .cursorrules files
A 4-hour coding workshop to understand how LLMs are implemented and used
High Performance ServiceMesh Data Plane Based on Programmable Kernel
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and…
Unified management of projects with large model APIs, unified conversion to OpenAI format, calling multiple backend services, OpenAI, Anthropic, Gemini, Vertex, Cloudflare, DeepBricks, OpenRouter, …
🤱🏻 Turn any webpage into a desktop app with Rust. 🤱🏻 利用 Rust 轻松构建轻量级多端桌面应用
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
AL-Ref-SAM 2: Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation
face detection face recognition包含人脸检测(retinaface,yolov5face,yolov7face,yolov8face),人脸检测跟踪(ByteTracker),人脸角度计算(Face_Angle)人脸矫正(Face_Aligner),人脸识别(Arcface),口罩检测(MaskRecognitiion),年龄性别检测(Gender_age),静…
[EMNLP 2024] RWKV-CLIP: A Robust Vision-Language Representation Learner
This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)
Minimal code and examnples for inferencing Sapiens foundation human models in Pytorch
High-resolution models for human tasks.
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
An open-source RAG-based tool for chatting with your documents.
Lightning-fast serving engine for AI models. Flexible. Easy. Enterprise-scale.