Stars
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors
Official repo of paper "Linguistic Minimal Pairs Elicit Linguistic Similarity in Large Language Models"
A simple tool to update bib entries with their official information (e.g., DBLP or the ACL anthology).
⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。
《开源大模型食用指南》基于Linux环境快速部署开源大模型,更适合中国宝宝的部署教程
Anthropic's educational courses
A fine-tuned model from Qwen2-1.5B-Instruct, capable of handling sensitive topics like violence, explicit content. / 从 Qwen2-1.5B-Instruct 微调,能处理暴力、色情、违法等敏感话题
MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。
Does Refusal Training in LLMs Generalize to the Past Tense? [arXiv, July 2024]
Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".
The implementation of Large Language Models are Parallel Multilingual Learners.
[EMNLP 2024] The official GitHub repo for the paper "Course-Correction: Safety Alignment Using Synthetic Preferences"
A collection of automated evaluators for assessing jailbreak attempts.
This repository contains the code for the paper "Tricking LLMs into Disobedience: Formalizing, Analyzing, and Detecting Jailbreaks" by Abhinav Rao, Sachin Vashishta*, Atharva Naik*, Somak Aditya, a…
COLING'24 Humanizing Machine-Generated Content: Evading AI-Text Detection through Adversarial Attack
[ACL'24] MC^2: A Multilingual Corpus of Minority Languages in China (Tibetan, Uyghur, Kazakh, and Mongolian)
The most popular AI tools list sorted by category 2024;2024年分类排序的最受欢迎人工智能工具列表
Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks
ReFT: Representation Finetuning for Language Models
Instruction Tuning data generation uses LLM in a specific scenario.
Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs
Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"