Highlights
- Pro
Stars
A high-performance, zero-overhead, extensible Python compiler using LLVM
🗺️ Data Cleaning and Textual Data Visualization 🗺️
Code to download and tokenize wikipedia data.
Tesseract Open Source OCR Engine (main repository)
An R implementation of Reinforced Poisson Process (RPP) model
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
High accuracy RAG for answering questions from scientific documents with citations
Python library for ngram collection and frequency smoothing
EGG: Emergence of lanGuage in Games
A module for getting data into python from large data sources
S2ORC: The Semantic Scholar Open Research Corpus: https://www.aclweb.org/anthology/2020.acl-main.447/
Scripts for creating a clean copy of the compressed tagged files of the COHA corpus.
The simplest, fastest repository for training/finetuning medium-sized GPTs.
rahmacha / EGG
Forked from facebookresearch/EGGEGG: Emergence of lanGuage in Games
UDPipe: Trainable pipeline for tokenizing, tagging, lemmatizing and parsing Universal Treebanks and other CoNLL-U files
LYT Mode is for "Linking Your Thinking". It invokes sensemaking and lateral thinking.
You like pytorch? You like micrograd? You love tinygrad! ❤️
An autoregressive character-level language model for making more things
A curated list of awesome ggplot2 tutorials, packages etc.
Studying phonotactics and how it relates to other language features
Multilingual Generative Pretrained Model
RipsNet: a general architecture for fast and robust estimation of the persistent homology of point clouds
Calculating difference between expected and real homophony.