Stars
A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。
MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。
🔥 🔥 🔥 Open Source JIRA, Linear, Monday, and Asana Alternative. Plane helps you track your issues, epics, and product roadmaps in the simplest way possible.
A generative speech model for daily dialogue.
XIAOJUSURVEY is an enterprises form builder and analytics platform that allows users to create questionnaires, exams, polls, quizzes, and analyze data online.
为英语学习者量身打造的视频播放器,助你通过观看视频、沉浸真实语境,轻松提升英语水平。#美剧 #播放器 #听力
Turns Data and AI algorithms into production-ready web applications in no time.
开源易用的中文离线OCR,识别率媲美大厂,并且提供了易用的web页面及web的接口,方便人类日常工作使用或者其他程序来调用~
freeCodeCamp.org's open-source codebase and curriculum. Learn to code for free.
The latest SQLite version of the China Biographical Database
Some useful Chinese corpus datasets 中文语料小数据
《动手学深度学习》:面向中文读者、能运行、可讨论。中英文版被70多个国家的500多所大学用于教学。
A large-scale 7B pretraining language model developed by BaiChuan-Inc.
A 13B large language model developed by Baichuan Intelligent Technology
Easily migrate your codebase from one framework or language to another.
Repo for BenTsao [original name: HuaTuo (华驼)], Instruction-tuning Large Language Models with Chinese Medical Knowledge. 本草(原名:华驼)模型仓库,基于中文医学知识的大语言模型指令微调
GUI for ChatGPT API and many LLMs. Supports agents, file-based QA, GPT finetuning and query with web search. All with a neat UI.
🗃️ A javascript debundler. Takes a Browserify or Webpack bundle and recreates the initial, pre-bundled source.
Virtual whiteboard for sketching hand-drawn like diagrams
Apache ECharts is a powerful, interactive charting and data visualization library for browser
Make a cascading timeline from markdown-like text. Supports simple American/European date styles, ISO8601, images, links, locations, and more.
📦 Zero-configuration bundler for tiny modules.