Stars
Triton implementation of FlashAttention2 that adds Custom Masks.
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
Schedule-Free Optimization in PyTorch
PyTorch code and models for V-JEPA self-supervised learning from video.
A concise but complete full-attention transformer with a set of promising experimental features from various papers
Official codebase for I-JEPA, the Image-based Joint-Embedding Predictive Architecture. First outlined in the CVPR paper, "Self-supervised learning from images with a joint-embedding predictive arch…