LLAMA Datasets
Alpaca dataset from Stanford, cleaned and curated
Llama LoRA finetuned for instructions using ChatGPT responses!
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
A Multilingual Dataset for Parsing Realistic Task-Oriented Dialogs
A dataset featuring diverse dialogues between two ChatGPT (gpt-3.5-turbo) instances with system messages written by GPT-4. Covering various contexts and tasks (task-oriented dialogue systems, abstr…
A collection of modular datasets generated by GPT-4, General-Instruct - Roleplay-Instruct - Code-Instruct - and Toolformer
An experiment to see if chatgpt can improve the output of the stanford alpaca dataset
Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform
We unified the interfaces of instruction-tuning data (e.g., CoT data), multiple LLMs and parameter-efficient methods (e.g., lora, p-tuning) together for easy use. We welcome open-source enthusiasts…
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)