release latest models

zzlTim · Sep 25, 2023 · fc57dea · fc57dea
1 parent fb3180d
commit fc57dea
Show file tree

Hide file tree

Showing 13 changed files with 941 additions and 238 deletions.
diff --git a/FAQ.md b/FAQ.md
@@ -4,7 +4,7 @@
 
 #### Failure in installing flash attention
 
-Flash attention is an option for accelerating training and inference. Only NVIDIA GPUs of Turing, Ampere, Ada, and Hopper architecture, e.g., H100, A100, RTX 3090, T4, RTX 2080, can support flash attention. You can use our models without installing it.
+Flash attention is an option for accelerating training and inference. Only NVIDIA GPUs of Turing, Ampere, Ada, and Hopper architecture, e.g., H100, A100, RTX 3090, T4, RTX 2080, can support flash attention. **You can use our models without installing it.**
 
 #### Which version of transformers should I use?
 
@@ -20,7 +20,7 @@ This is the merge file of the tokenizer. You have to download it. Note that if y
 
 #### transformers_stream_generator/tiktoken/accelerate not found
 
-Run the command `pip install -r requirements.txt`. You can find the file at [https://github.com/QwenLM/Qwen-7B/blob/main/requirements.txt](https://github.com/QwenLM/Qwen-7B/blob/main/requirements.txt).
+Run the command `pip install -r requirements.txt`. You can find the file at [https://github.com/QwenLM/Qwen-7B/blob/main/requirements.txt](https://github.com/QwenLM/Qwen/blob/main/requirements.txt).
 <br><br>
 
 
@@ -32,7 +32,6 @@ Run the command `pip install -r requirements.txt`. You can find the file at [htt
 Yes, see `web_demo.py` for web demo and `cli_demo.py` for CLI demo. See README for more information.
 
 
-
 #### Can I use CPU only?
 
 Yes, run `python  cli_demo.py --cpu-only` will load the model and inference on CPU only.
@@ -47,19 +46,16 @@ This is because tokens represent bytes and a single token may be a meaningless s
 
 #### It seems that the generation is not related to the instruction...
 
-Please check if you are loading Qwen-7B-Chat instead of Qwen-7B. Qwen-7B is the base model without alignment, which behaves differently from the SFT/Chat model.
+Please check if you are loading Qwen-Chat instead of Qwen. Qwen is the base model without alignment, which behaves differently from the SFT/Chat model.
 
 #### Is quantization supported?
 
-Yes, the quantization is supported by `bitsandbytes`. We are working on an improved version and will release the quantized model checkpoints.
-
-#### Errors in running quantized models: `importlib.metadata.PackageNotFoundError: No package metadata was found for bitsandbytes`
+Yes, the quantization is supported by AutoGPTQ. 
 
-For Linux users，running `pip install bitsandbytes` directly can solve the problem. For Windows users, you can run `python -m pip install bitsandbytes --prefer-binary --extra-index-url=https://jllllll.github.io/bitsandbytes-windows-webui`·
 
 #### Slow when processing long sequences
 
-We solved this problem. Updating the code to the latest version can help.
+Updating the code to the latest version can help.
 
 #### Unsatisfactory performance in processing long sequences
 
@@ -72,7 +68,9 @@ Please ensure that NTK is applied. `use_dynamc_ntk` and `use_logn_attn` in `conf
 
 #### Can Qwen support SFT or even RLHF?
 
-We do not provide finetuning or RLHF codes for now. However, some projects have supported finetuning, see [FastChat](**[https://github.com/lm-sys/FastChat](https://github.com/lm-sys/FastChat)), [Firefly]([https://github.com/yangjianxin1/Firefly](https://github.com/yangjianxin1/Firefly)), [**LLaMA Efficient Tuning**]([https://github.com/hiyouga/LLaMA-Efficient-Tuning](https://github.com/hiyouga/LLaMA-Efficient-Tuning)), etc. We will soon update the relevant codes.
+Yes, we now support SFT, including full-parameter finetuning, LoRA, and Q-LoRA. Also you can check other projects like [FastChat](**[https://github.com/lm-sys/FastChat](https://github.com/lm-sys/FastChat)), [Firefly]([https://github.com/yangjianxin1/Firefly](https://github.com/yangjianxin1/Firefly)), [**LLaMA Efficient Tuning**]([https://github.com/hiyouga/LLaMA-Efficient-Tuning](https://github.com/hiyouga/LLaMA-Efficient-Tuning)), etc.
+
+However, temporarily we do not support RLHF. We will provide the code in the near future.
 <br><br>
 
 

diff --git a/FAQ_ja.md b/FAQ_ja.md
@@ -20,7 +20,7 @@ Flash attention は、トレーニングと推論を加速するオプション
 
 #### transformers_stream_generator/tiktoken/accelerate が見つかりません。
 
-コマンド `pip install -r requirements.txt` を実行してください。このファイルは [https://github.com/QwenLM/Qwen-7B/blob/main/requirements.txt](https://github.com/QwenLM/Qwen-7B/blob/main/requirements.txt) にあります。
+コマンド `pip install -r requirements.txt` を実行してください。このファイルは [https://github.com/QwenLM/Qwen/blob/main/requirements.txt](https://github.com/QwenLM/Qwen/blob/main/requirements.txt) にあります。
 <br><br>
 
 
@@ -47,19 +47,16 @@ Flash attention は、トレーニングと推論を加速するオプション
 
 #### インストラクションとは関係ないようですが...
 
-Qwen-7B ではなく Qwen-7B-Chat を読み込んでいないか確認してください。Qwen-7B はアライメントなしのベースモデルで、SFT/Chat モデルとは挙動が異なります。
+Qwen ではなく Qwen-Chat を読み込んでいないか確認してください。Qwen はアライメントなしのベースモデルで、SFT/Chat モデルとは挙動が異なります。
 
 #### 量子化はサポートされていますか？
 
-はい、量子化は `bitsandbytes` でサポートされています。私たちは改良版の開発に取り組んでおり、量子化されたモデルのチェックポイントをリリースする予定です。
+はい、量子化は AutoGPTQ でサポートされています。
 
-#### 量子化モデル実行時のエラー: `importlib.metadata.PackageNotFoundError: No package metadata was found for bitsandbytes`
-
-Linux ユーザの場合は，`pip install bitsandbytes` を直接実行することで解決できます。Windows ユーザの場合は、`python -m pip install bitsandbytes --prefer-binary --extra-index-url=https://jllllll.github.io/bitsandbytes-windows-webui` を実行することができます。
 
 #### 長いシーケンスの処理に時間がかかる
 
-この問題は解決しました。コードを最新版に更新することで解決します。
+コードを最新版に更新することで解決します。
 
 #### 長いシーケンスの処理で不満足なパフォーマンス
 
@@ -72,7 +69,7 @@ NTK が適用されていることを確認してください。`config.json`
 
 #### Qwen は SFT、あるいは RLHF に対応できますか？
 
-今のところ、ファインチューニングや RLHF のコードは提供していません。しかし、[FastChat](**[https://github.com/lm-sys/FastChat](https://github.com/lm-sys/FastChat))、[Firefly]([https://github.com/yangjianxin1/Firefly](https://github.com/yangjianxin1/Firefly))、[**LLaMA Efficient Tuning**]([https://github.com/hiyouga/LLaMA-Efficient-Tuning](https://github.com/hiyouga/LLaMA-Efficient-Tuning))など、いくつかのプロジェクトではファインチューニングをサポートしています。近日中に関連コードを更新する予定です。
+SFTのコードは提供します。[FastChat](**[https://github.com/lm-sys/FastChat](https://github.com/lm-sys/FastChat))、[Firefly]([https://github.com/yangjianxin1/Firefly](https://github.com/yangjianxin1/Firefly))、[**LLaMA Efficient Tuning**]([https://github.com/hiyouga/LLaMA-Efficient-Tuning](https://github.com/hiyouga/LLaMA-Efficient-Tuning))など、いくつかのプロジェクトではファインチューニングをサポートしています。近日中に関連コードを更新する予定です。
 <br><br>
 
 

diff --git a/FAQ_zh.md b/FAQ_zh.md
@@ -20,7 +20,7 @@ flash attention是一个用于加速模型训练推理的可选项，且仅适
 
 #### transformers_stream_generator/tiktoken/accelerate，这几个库提示找不到，怎么办？
 
-运行如下命令：`pip install -r requirements.txt`。相关依赖库在[https://github.com/QwenLM/Qwen-7B/blob/main/requirements.txt](https://github.com/QwenLM/Qwen-7B/blob/main/requirements.txt) 可以找到。
+运行如下命令：`pip install -r requirements.txt`。相关依赖库在[https://github.com/QwenLM/Qwen-7B/blob/main/requirements.txt](https://github.com/QwenLM/Qwen/blob/main/requirements.txt) 可以找到。
 <br><br>
 
 
@@ -44,19 +44,15 @@ Qwen当前支持流式推理。见位于`modeling_qwen.py`的`chat_stream`函数
 
 #### 模型的输出看起来与输入无关/没有遵循指令/看起来呆呆的
 
-请检查是否加载的是Qwen-7B-Chat模型进行推理，Qwen-7B模型是未经align的预训练基模型，不期望具备响应用户指令的能力。我们在模型最新版本已经对`chat`及`chat_stream`接口内进行了检查，避免您误将预训练模型作为SFT/Chat模型使用。
+请检查是否加载的是Qwen-Chat模型进行推理，Qwen模型是未经align的预训练基模型，不期望具备响应用户指令的能力。我们在模型最新版本已经对`chat`及`chat_stream`接口内进行了检查，避免您误将预训练模型作为SFT/Chat模型使用。
 
 #### 是否有量化版本模型
 
-目前Qwen支持基于`bitsandbytes`的8-bit和4-bit的量化推理。后续我们将进一步更新提供更加高效的量化推理实现，并提供对应的量化模型。
-
-#### 运行量化推理报错：`importlib.metadata.PackageNotFoundError: No package metadata was found for bitsandbytes`
-
-对于linux 用户，直接`pip install bitsandbytes`即可。对于windows用户，可以 运行`python -m pip install bitsandbytes --prefer-binary --extra-index-url=https://jllllll.github.io/bitsandbytes-windows-webui`。
+目前Qwen支持基于AutoGPTQ的4-bit的量化推理。
 
 #### 生成序列较长后速度显著变慢
 
-这一问题已经在最新版本中修复。请更新到最新代码。
+请更新到最新代码。
 
 #### 处理长序列时效果有问题
 
@@ -68,7 +64,9 @@ Qwen当前支持流式推理。见位于`modeling_qwen.py`的`chat_stream`函数
 
 #### 当前是否支持SFT和RLHF？
 
-我们目前未提供SFT和RLHF代码。当前有多个外部项目已实现支持，如[FastChat](**[https://github.com/lm-sys/FastChat](https://github.com/lm-sys/FastChat))、[Firefly]([https://github.com/yangjianxin1/Firefly](https://github.com/yangjianxin1/Firefly))、[**LLaMA Efficient Tuning**]([https://github.com/hiyouga/LLaMA-Efficient-Tuning](https://github.com/hiyouga/LLaMA-Efficient-Tuning))等。我们会尽快更新这部分代码和说明。
+我们目前提供了SFT的代码，支持全参数微调、LoRA和Q-LoRA。此外，当前有多个外部项目也已实现支持，如[FastChat](**[https://github.com/lm-sys/FastChat](https://github.com/lm-sys/FastChat))、[Firefly]([https://github.com/yangjianxin1/Firefly](https://github.com/yangjianxin1/Firefly))、[**LLaMA Efficient Tuning**]([https://github.com/hiyouga/LLaMA-Efficient-Tuning](https://github.com/hiyouga/LLaMA-Efficient-Tuning))等。我们会尽快更新这部分代码和说明。
+
+我们还没提供对RLHF训练的支持，敬请期待。
 <br><br>
 
 

diff --git a/LICENSE b/LICENSE
@@ -9,7 +9,7 @@ By clicking to agree or by using or distributing any portion or element of the T
     b. "We"(or "Us") shall mean Alibaba Cloud.
     c. "You" (or "Your") shall mean a natural person or legal entity exercising the rights granted by this Agreement and/or using the Materials for any purpose and in any field of use.
     d. "Third Parties" shall mean individuals or legal entities that are not under common control with Us or You.
-    e. "Tongyi Qianwen" shall mean the large language models (including Qwen-7B model and Qwen-7B-Chat model), and software and algorithms, consisting of trained model weights, parameters (including optimizer states), machine-learning model code, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Us.
+    e. "Tongyi Qianwen" shall mean the large language models (including Qwen model and Qwen-Chat model), and software and algorithms, consisting of trained model weights, parameters (including optimizer states), machine-learning model code, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Us.
     f. "Materials" shall mean, collectively, Alibaba Cloud's proprietary Tongyi Qianwen and Documentation (and any portion thereof) made available under this Agreement.
     g. "Source" form shall mean the preferred form for making modifications, including but not limited to model source code, documentation source, and configuration files.
     h. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation,