Skip to content
View wangruihui0429's full-sized avatar

Block or report wangruihui0429

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors

Jupyter Notebook 31 Updated Jun 27, 2024

Official repo of paper "Linguistic Minimal Pairs Elicit Linguistic Similarity in Large Language Models"

4 Updated Sep 20, 2024
1 Updated Sep 19, 2024

A simple tool to update bib entries with their official information (e.g., DBLP or the ACL anthology).

Python 2,597 157 Updated Aug 18, 2024

⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。

Python 10,594 1,036 Updated Sep 28, 2024

主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题

HTML 2,929 339 Updated Aug 19, 2024

《开源大模型食用指南》基于Linux环境快速部署开源大模型,更适合中国宝宝的部署教程

Jupyter Notebook 8,237 986 Updated Sep 29, 2024

JailBench:大型语言模型越狱攻击风险评测中文数据集

12 1 Updated Jul 13, 2024

Anthropic's educational courses

Jupyter Notebook 5,480 420 Updated Sep 18, 2024

A fine-tuned model from Qwen2-1.5B-Instruct, capable of handling sensitive topics like violence, explicit content. / 从 Qwen2-1.5B-Instruct 微调,能处理暴力、色情、违法等敏感话题

Python 124 30 Updated Aug 22, 2024

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

3,416 233 Updated Oct 3, 2024

Does Refusal Training in LLMs Generalize to the Past Tense? [arXiv, July 2024]

Python 50 6 Updated Oct 3, 2024

Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".

Jupyter Notebook 52 12 Updated Mar 11, 2024

The implementation of Large Language Models are Parallel Multilingual Learners.

Roff 10 Updated Mar 18, 2024

[EMNLP 2024] The official GitHub repo for the paper "Course-Correction: Safety Alignment Using Synthetic Preferences"

Python 19 1 Updated Oct 2, 2024

A collection of automated evaluators for assessing jailbreak attempts.

Python 59 8 Updated Jun 26, 2024

This repository contains the code for the paper "Tricking LLMs into Disobedience: Formalizing, Analyzing, and Detecting Jailbreaks" by Abhinav Rao, Sachin Vashishta*, Atharva Naik*, Somak Aditya, a…

Jupyter Notebook 5 2 Updated May 22, 2024

COLING'24 Humanizing Machine-Generated Content: Evading AI-Text Detection through Adversarial Attack

Python 26 2 Updated Apr 10, 2024
Jupyter Notebook 29 2 Updated Jun 13, 2024

[ACL'24] MC^2: A Multilingual Corpus of Minority Languages in China (Tibetan, Uyghur, Kazakh, and Mongolian)

Python 16 1 Updated Jun 15, 2024

The most popular AI tools list sorted by category 2024;2024年分类排序的最受欢迎人工智能工具列表

165 23 Updated Aug 5, 2024

Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks

Python 20 1 Updated Jul 9, 2024

ReFT: Representation Finetuning for Language Models

Python 1,118 95 Updated Sep 29, 2024

Instruction Tuning data generation uses LLM in a specific scenario.

Python 14 5 Updated May 2, 2024
HTML 8 1 Updated May 14, 2024

中國外交部發言資料集

Python 5 1 Updated Mar 17, 2020

Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs

Jupyter Notebook 166 23 Updated Jun 7, 2024
Jupyter Notebook 144 14 Updated Nov 26, 2023

Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"

Python 65 6 Updated Sep 5, 2024
Next