Merge branch 'qg-example' of github.com:westfish/PaddleNLP into qg-ex…

…ample
LiuChiachi · Oct 20, 2022 · 31dc47a · 31dc47a
2 parents 361d826 + 289d821
commit 31dc47a
Show file tree

Hide file tree

Showing 26 changed files with 912 additions and 126 deletions.
diff --git a/README_cn.md b/README_cn.md
@@ -37,19 +37,6 @@
   * 🍭 AIGC 内容生成：新增代码生成 SOTA 模型[**CodeGen**](./examples/code_generation/codegen)，支持多种编程语言代码生成；集成[**文图生成潮流模型**](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/docs/model_zoo/taskflow.md#%E6%96%87%E5%9B%BE%E7%94%9F%E6%88%90) DALL·E Mini、Disco Diffusion、Stable Diffusion，更多趣玩模型等你来玩；新增[**中文文本摘要应用**](./applications/text_summarization)，基于大规模语料的中文摘要模型首次发布，可支持 Taskflow 一键调用和定制训练；
   * 💪 框架升级：[**模型自动压缩 API**](./docs/compression.md) 发布，自动对模型进行裁减和量化，大幅降低模型压缩技术使用门槛；[**小样本 Prompt**](./applications/text_classification/multi_class/few-shot)能力发布，集成 PET、P-Tuning、RGL 等经典算法。
 
-
-* 👀 **2022.9.6 飞桨智慧金融行业系列直播课**
-
-  * 围绕深度学习技术在金融行业的产业实践与发展趋势，邀请行业内专家分享产业实践。探讨科技金融的未来发展；
-
-  * PaddleNLP配套课程发布产业实践范例：基于UIE的金融文件信息抽取；基于Pipelines的FAQ问答系统；
-
-  * **9月6日起每周二、周四19点直播**，扫码免费加入微信群获取直播链接，与行业专家深度交流：
-
-    <div align="center">
-    <img src="https://user-images.githubusercontent.com/11793384/188596360-264415d4-5462-43ad-8517-5b7e690061ce.jpg" width="150" height="150" />
-    </div>
-
 * 🔥 **2022.5.16  发布 [PaddleNLP v2.3](https://github.com/PaddlePaddle/PaddleNLP/releases/tag/v2.3.0)**
   * 💎 发布通用信息抽取技术 [**UIE**](./model_zoo/uie)，单模型支持实体识别、关系和事件抽取、情感分析等多种开放域信息抽取任务，不限领域和抽取目标，支持**零样本抽取**与全流程**小样本**高效定制开发；
   * 😊 发布文心大模型 [**ERNIE 3.0**](./model_zoo/ernie-3.0) 轻量级模型，在 [CLUE ](https://www.cluebenchmarks.com/)上实现同规模结构效果最佳，并提供**🗜️无损压缩**和**⚙️全场景部署**方案；
@@ -58,7 +45,7 @@
 
 ## 社区交流
 
-- 微信扫描二维码并填写问卷之后，加入交流群领取福利
+- 微信扫描二维码并填写问卷，回复小助手关键词（NLP）之后，即可加入交流群领取福利
   - 与众多社区开发者以及官方团队深度交流。
   - 10G重磅NLP学习大礼包！
 
@@ -83,6 +70,14 @@ Taskflow提供丰富的**📦开箱即用**的产业级NLP预置模型，覆盖
 
 ![taskflow1](https://user-images.githubusercontent.com/11793384/159693816-fda35221-9751-43bb-b05c-7fc77571dd76.gif)
 
+Taskflow最新集成了文生图的趣玩应用，三行代码体验 **Stable Diffusion**
+```python
+from paddlenlp import Taskflow
+text_to_image = Taskflow("text_to_image", model="CompVis/stable-diffusion-v1-4")
+image_list = text_to_image('"In the morning light,Chinese ancient buildings in the mountains,Magnificent and fantastic John Howe landscape,lake,clouds,farm,Fairy tale,light effect,Dream,Greg Rutkowski,James Gurney,artstation"')
+```
+<img width="300" alt="image" src="https://user-images.githubusercontent.com/16698950/194882669-f7cc7c98-d63a-45f4-99c1-0514c6712368.png">
+
 更多使用方法可参考[Taskflow文档](./docs/model_zoo/taskflow.md)。
 
 ### 丰富完备的中文模型库

diff --git a/applications/text_classification/hierarchical/deploy/paddle_serving/README.md b/applications/text_classification/hierarchical/deploy/paddle_serving/README.md
@@ -153,20 +153,37 @@ I0727 06:50:34.993671 43126 naive_executor.cc:102] ---  skip [linear_75.tmp_1],
 [OP Object] init success
 ```
 
-#### 启动client测试
+#### 启动rpc client测试
 注意执行客户端请求时关闭代理，并根据实际情况修改server_url地址(启动服务所在的机器)
 ```shell
 python rpc_client.py
 ```
 输出打印如下:
 ```
-text:  请问木竭胶囊能同高血压药、氨糖同时服吗？
-label:  3,37
+text:  消失的“外企光环”，5月份在华裁员900余人，香饽饽变“臭”了
+label: 组织关系,组织关系##裁员
 --------------------
-text:  低压100*高压140*头涨，想吃点降压药。谢谢！
-label:  0
+text:  卡车超载致使跨桥侧翻，没那么简单
+label: 灾害/意外,灾害/意外##坍/垮塌
 --------------------
-text:  脑穿通畸形易发人群有哪些
-label:  0,9
+text:  金属卡扣安装不到位，上海乐扣乐扣贸易有限公司将召回捣碎器1162件
+label: 产品行为,产品行为##召回
+--------------------
+```
+#### 启动http client测试
+注意执行客户端请求时关闭代理，并根据实际情况修改server_url地址(启动服务所在的机器)
+```shell
+python http_client.py
+```
+输出打印如下:
+```
+text:  消失的“外企光环”，5月份在华裁员900余人，香饽饽变“臭”了
+label: 组织关系,组织关系##裁员
+--------------------
+text:  卡车超载致使跨桥侧翻，没那么简单
+label: 灾害/意外,灾害/意外##坍/垮塌
+--------------------
+text:  金属卡扣安装不到位，上海乐扣乐扣贸易有限公司将召回捣碎器1162件
+label: 产品行为,产品行为##召回
 --------------------
 ```
diff --git a/applications/text_classification/hierarchical/deploy/paddle_serving/config.yml b/applications/text_classification/hierarchical/deploy/paddle_serving/config.yml
@@ -1,8 +1,8 @@
 #rpc端口, rpc_port和http_port不允许同时为空。当rpc_port为空且http_port不为空时，会自动将rpc_port设置为http_port+1
-rpc_port: 7688
+rpc_port: 18090
 
 #http端口, rpc_port和http_port不允许同时为空。当rpc_port可用且http_port为空时，不自动生成http_port
-http_port: 9998
+http_port: 9878
 
 #worker_num, 最大并发数。
 #当build_dag_each_worker=True时, 框架会创建worker_num个进程，每个进程内构建grpcSever和DAG

diff --git a/applications/text_classification/hierarchical/deploy/paddle_serving/http_client.py b/applications/text_classification/hierarchical/deploy/paddle_serving/http_client.py
@@ -0,0 +1,67 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import numpy as np
+from numpy import array
+import requests
+import json
+import sys
+
+
+class Runner(object):
+
+    def __init__(
+        self,
+        server_url: str,
+    ):
+        self.server_url = server_url
+
+    def Run(self, text, label_list):
+        sentence = np.array([t.encode('utf-8') for t in text], dtype=np.object_)
+        sentence = sentence.__repr__()
+        data = {"key": ["sentence"], "value": [sentence]}
+        data = json.dumps(data)
+
+        ret = requests.post(url=self.server_url, data=data)
+        ret = ret.json()
+        for t, l in zip(text, eval(ret['value'][0])):
+            print("text: ", t)
+            label = ','.join([label_list[int(ll)] for ll in l.split(',')])
+            print("label: ", label)
+            print("--------------------")
+        return
+
+
+if __name__ == "__main__":
+    server_url = "http://127.0.0.1:9878/seq_cls/prediction"
+    runner = Runner(server_url)
+    text = [
+        "消失的“外企光环”，5月份在华裁员900余人，香饽饽变“臭”了？", "卡车超载致使跨桥侧翻，没那么简单",
+        "金属卡扣安装不到位，上海乐扣乐扣贸易有限公司将召回捣碎器1162件"
+    ]
+    label_list = [
+        '交往', '交往##会见', '交往##感谢', '交往##探班', '交往##点赞', '交往##道歉', '产品行为',
+        '产品行为##上映', '产品行为##下架', '产品行为##发布', '产品行为##召回', '产品行为##获奖', '人生',
+        '人生##产子/女', '人生##出轨', '人生##分手', '人生##失联', '人生##婚礼', '人生##庆生', '人生##怀孕',
+        '人生##死亡', '人生##求婚', '人生##离婚', '人生##结婚', '人生##订婚', '司法行为', '司法行为##举报',
+        '司法行为##入狱', '司法行为##开庭', '司法行为##拘捕', '司法行为##立案', '司法行为##约谈', '司法行为##罚款',
+        '司法行为##起诉', '灾害/意外', '灾害/意外##地震', '灾害/意外##坍/垮塌', '灾害/意外##坠机',
+        '灾害/意外##洪灾', '灾害/意外##爆炸', '灾害/意外##袭击', '灾害/意外##起火', '灾害/意外##车祸', '竞赛行为',
+        '竞赛行为##夺冠', '竞赛行为##晋级', '竞赛行为##禁赛', '竞赛行为##胜负', '竞赛行为##退役', '竞赛行为##退赛',
+        '组织关系', '组织关系##停职', '组织关系##加盟', '组织关系##裁员', '组织关系##解散', '组织关系##解约',
+        '组织关系##解雇', '组织关系##辞/离职', '组织关系##退出', '组织行为', '组织行为##开幕', '组织行为##游行',
+        '组织行为##罢工', '组织行为##闭幕', '财经/交易', '财经/交易##上市', '财经/交易##出售/收购',
+        '财经/交易##加息', '财经/交易##涨价', '财经/交易##涨停', '财经/交易##融资', '财经/交易##跌停',
+        '财经/交易##降价', '财经/交易##降息'
+    ]
+    runner.Run(text, label_list)
diff --git a/applications/text_classification/hierarchical/deploy/paddle_serving/rpc_client.py b/applications/text_classification/hierarchical/deploy/paddle_serving/rpc_client.py
@@ -26,21 +26,37 @@ def __init__(
         self.client = PipelineClient()
         self.client.connect([server_url])
 
-    def Run(self, data):
+    def Run(self, data, label_list):
         data = np.array([x.encode('utf-8') for x in data], dtype=np.object_)
         ret = self.client.predict(feed_dict={"sentence": data})
         for d, l, in zip(data, eval(ret.value[0])):
             print("text: ", d)
-            print("label: ", l)
+            label = ','.join([label_list[int(ll)] for ll in l.split(',')])
+            print("label: ", label)
             print("--------------------")
         return
 
 
 if __name__ == "__main__":
-    server_url = "127.0.0.1:7688"
+    server_url = "127.0.0.1:18090"
     runner = Runner(server_url)
-    texts = [
+    text = [
         "消失的“外企光环”，5月份在华裁员900余人，香饽饽变“臭”了？", "卡车超载致使跨桥侧翻，没那么简单",
         "金属卡扣安装不到位，上海乐扣乐扣贸易有限公司将召回捣碎器1162件"
     ]
-    runner.Run(texts)
+    label_list = [
+        '交往', '交往##会见', '交往##感谢', '交往##探班', '交往##点赞', '交往##道歉', '产品行为',
+        '产品行为##上映', '产品行为##下架', '产品行为##发布', '产品行为##召回', '产品行为##获奖', '人生',
+        '人生##产子/女', '人生##出轨', '人生##分手', '人生##失联', '人生##婚礼', '人生##庆生', '人生##怀孕',
+        '人生##死亡', '人生##求婚', '人生##离婚', '人生##结婚', '人生##订婚', '司法行为', '司法行为##举报',
+        '司法行为##入狱', '司法行为##开庭', '司法行为##拘捕', '司法行为##立案', '司法行为##约谈', '司法行为##罚款',
+        '司法行为##起诉', '灾害/意外', '灾害/意外##地震', '灾害/意外##坍/垮塌', '灾害/意外##坠机',
+        '灾害/意外##洪灾', '灾害/意外##爆炸', '灾害/意外##袭击', '灾害/意外##起火', '灾害/意外##车祸', '竞赛行为',
+        '竞赛行为##夺冠', '竞赛行为##晋级', '竞赛行为##禁赛', '竞赛行为##胜负', '竞赛行为##退役', '竞赛行为##退赛',
+        '组织关系', '组织关系##停职', '组织关系##加盟', '组织关系##裁员', '组织关系##解散', '组织关系##解约',
+        '组织关系##解雇', '组织关系##辞/离职', '组织关系##退出', '组织行为', '组织行为##开幕', '组织行为##游行',
+        '组织行为##罢工', '组织行为##闭幕', '财经/交易', '财经/交易##上市', '财经/交易##出售/收购',
+        '财经/交易##加息', '财经/交易##涨价', '财经/交易##涨停', '财经/交易##融资', '财经/交易##跌停',
+        '财经/交易##降价', '财经/交易##降息'
+    ]
+    runner.Run(text, label_list)
diff --git a/applications/text_classification/hierarchical/few-shot/infer.py b/applications/text_classification/hierarchical/few-shot/infer.py
@@ -178,9 +178,12 @@ def preprocess(self, input_data: list):
         text = [InputExample(text_a=x) for x in input_data]
         inputs = [self._template.wrap_one_example(x) for x in text]
         inputs = {
-            "input_ids": np.array([x["input_ids"] for x in inputs]),
-            "mask_ids": np.array([x["mask_ids"] for x in inputs]),
-            "soft_token_ids": np.array([x["soft_token_ids"] for x in inputs])
+            "input_ids":
+            np.array([x["input_ids"] for x in inputs], dtype="int64"),
+            "mask_ids":
+            np.array([x["mask_ids"] for x in inputs], dtype="int64"),
+            "soft_token_ids":
+            np.array([x["soft_token_ids"] for x in inputs], dtype="int64")
         }
         return inputs
 

diff --git a/applications/text_classification/multi_class/deploy/paddle_serving/README.md b/applications/text_classification/multi_class/deploy/paddle_serving/README.md
@@ -149,7 +149,7 @@ I0628 09:12:30.787542 74305 naive_executor.cc:102] ---  skip [linear_147.tmp_1],
 
 ```
 
-#### 启动client测试
+#### 启动rpc client测试
 注意执行客户端请求时关闭代理，并根据实际情况修改server_url地址(启动服务所在的机器)
 ```shell
 python rpc_client.py
@@ -173,3 +173,28 @@ label:  病因分析
 --------------------
 
 ```
+
+#### 启动http client测试
+注意执行客户端请求时关闭代理，并根据实际情况修改server_url地址(启动服务所在的机器)
+```shell
+python http_client.py
+```
+输出打印如下:
+```
+data:  黑苦荞茶的功效与作用及食用方法
+label:  功效作用
+--------------------
+data:  交界痣会凸起吗
+label:  疾病表述
+--------------------
+data:  检查是否能怀孕挂什么科
+label:  就医建议
+--------------------
+data:  鱼油怎么吃咬破吃还是直接咽下去
+label:  其他
+--------------------
+data:  幼儿挑食的生理原因是
+label:  病因分析
+--------------------
+
+```
diff --git a/applications/text_classification/multi_class/deploy/paddle_serving/http_client.py b/applications/text_classification/multi_class/deploy/paddle_serving/http_client.py
@@ -0,0 +1,55 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import numpy as np
+from numpy import array
+import requests
+import json
+import sys
+
+
+class Runner(object):
+
+    def __init__(
+        self,
+        server_url: str,
+    ):
+        self.server_url = server_url
+
+    def Run(self, text, label_list):
+        sentence = np.array([t.encode('utf-8') for t in text], dtype=np.object_)
+        sentence = sentence.__repr__()
+        data = {"key": ["sentence"], "value": [sentence]}
+        data = json.dumps(data)
+
+        ret = requests.post(url=self.server_url, data=data)
+        ret = ret.json()
+        for t, l in zip(text, eval(ret['value'][0])):
+            print("text: ", t)
+            print("label: ", label_list[l])
+            print("--------------------")
+        return
+
+
+if __name__ == "__main__":
+    server_url = "http://127.0.0.1:9878/seq_cls/prediction"
+    runner = Runner(server_url)
+    text = [
+        "黑苦荞茶的功效与作用及食用方法", "交界痣会凸起吗", "检查是否能怀孕挂什么科", "鱼油怎么吃咬破吃还是直接咽下去",
+        "幼儿挑食的生理原因是"
+    ]
+    label_list = [
+        '病情诊断', '治疗方案', '病因分析', '指标解读', '就医建议', '疾病表述', '后果表述', '注意事项', '功效作用',
+        '医疗费用', '其他'
+    ]
+    runner.Run(text, label_list)
diff --git a/applications/text_classification/multi_class/few-shot/README.md b/applications/text_classification/multi_class/few-shot/README.md
@@ -212,6 +212,7 @@ python train.py \
 --max_steps 1000 \
 --eval_steps 10 \
 --logging_steps 5 \
+--load_best_model_at_end True \
 --per_device_eval_batch_size 32 \
 --per_device_train_batch_size 8 \
 --do_predict \
@@ -235,6 +236,7 @@ python -u -m paddle.distributed.launch --gpus 0,1,2,3 train.py \
 --max_steps 1000 \
 --eval_steps 10 \
 --logging_steps 5 \
+--load_best_model_at_end True \
 --per_device_eval_batch_size 32 \
 --per_device_train_batch_size 8 \
 --do_predict \

diff --git a/applications/text_classification/multi_class/few-shot/infer.py b/applications/text_classification/multi_class/few-shot/infer.py
@@ -178,9 +178,12 @@ def preprocess(self, input_data: list):
         text = [InputExample(text_a=x) for x in input_data]
         inputs = [self._template.wrap_one_example(x) for x in text]
         inputs = {
-            "input_ids": np.array([x["input_ids"] for x in inputs]),
-            "mask_ids": np.array([x["mask_ids"] for x in inputs]),
-            "soft_token_ids": np.array([x["soft_token_ids"] for x in inputs])
+            "input_ids":
+            np.array([x["input_ids"] for x in inputs], dtype="int64"),
+            "mask_ids":
+            np.array([x["mask_ids"] for x in inputs], dtype="int64"),
+            "soft_token_ids":
+            np.array([x["soft_token_ids"] for x in inputs], dtype="int64")
         }
         return inputs
 

diff --git a/applications/text_classification/multi_label/deploy/paddle_serving/README.md b/applications/text_classification/multi_label/deploy/paddle_serving/README.md
@@ -150,22 +150,41 @@ W0625 16:45:40.312942 40218 gpu_context.cc:278] Please NOTE: device: 3, GPU Comp
 W0625 16:45:40.316538 40218 gpu_context.cc:306] device: 3, cuDNN Version: 8.1.
 ```
 
-#### 启动client测试
+#### 启动rpc client测试
 注意执行客户端请求时关闭代理，并根据实际情况修改server_url地址(启动服务所在的机器)
 ```shell
 python rpc_client.py
 ```
 输出打印如下:
 ```
 data:  五松新村房屋是被告婚前购买的；
-label:  10
+label:  婚前个人财产
 --------------------
 data:  被告于2016年3月将车牌号为皖B×××××出售了2.7万元，被告通过原告偿还了齐荷花人民币2.6万元，原、被告尚欠齐荷花2万元。
-label:  2,9
+label:  有夫妻共同财产,有夫妻共同债务
 --------------------
 data:  2、判令被告返还借婚姻索取的现金33万元，婚前个人存款10万元；
-label:  10
+label:  婚前个人财产
 --------------------
 data:  一、判决原告于某某与被告杨某某离婚；
-label:  8,11
+label:  准予离婚,法定离婚
+```
+#### 启动http client测试
+注意执行客户端请求时关闭代理，并根据实际情况修改server_url地址(启动服务所在的机器)
+```shell
+python http_client.py
+```
+输出打印如下:
+```
+data:  五松新村房屋是被告婚前购买的；
+label:  婚前个人财产
+--------------------
+data:  被告于2016年3月将车牌号为皖B×××××出售了2.7万元，被告通过原告偿还了齐荷花人民币2.6万元，原、被告尚欠齐荷花2万元。
+label:  有夫妻共同财产,有夫妻共同债务
+--------------------
+data:  2、判令被告返还借婚姻索取的现金33万元，婚前个人存款10万元；
+label:  婚前个人财产
+--------------------
+data:  一、判决原告于某某与被告杨某某离婚；
+label:  准予离婚,法定离婚
 ```