Merge branch 'PaddlePaddle:develop' into develop

PaddlePaddle · Jun 20, 2024 · 26fa8bf · 26fa8bf
2 parents f75436d + 5619cc3
commit 26fa8bf
Show file tree

Hide file tree

Showing 3,329 changed files with 44,233 additions and 215,423 deletions.
diff --git a/.github/workflows/fast_tokenizer.yml b/.github/workflows/fast_tokenizer.yml
diff --git a/.github/workflows/pipelines.yml b/.github/workflows/pipelines.yml
@@ -3,10 +3,10 @@ name: Pipelines-Test
 on:
   push:
     paths:
-      - 'pipelines/*'
+      - 'legacy/pipelines/*'
   pull_request:
     paths:
-      - 'pipelines/*'
+      - 'legacy/pipelines/*'
 
 
 jobs:
@@ -20,11 +20,11 @@ jobs:
           python-version: '3.10'
           cache: 'pip' # caching pip dependencies
       - name: Install dependencies
-        working-directory: ./pipelines
+        working-directory: ./legacy/pipelines
         run: |
           python -m pip install --upgrade pip
           make install
           pip install -r tests/requirements.txt
       - name: run the command
-        working-directory: ./pipelines
+        working-directory: ./legacy/pipelines
         run: make test
diff --git a/Makefile b/Makefile
@@ -37,6 +37,7 @@ test: unit-test
 unit-test:
 	PYTHONPATH=$(shell pwd) pytest -v \
 		-n auto \
+		--retries 1 --retry-delay 1 \
 		--durations 20 \
 		--cov paddlenlp \
 		--cov-report xml:coverage.xml

diff --git a/README.md b/README.md
@@ -279,18 +279,6 @@ PaddleNLP针对信息抽取、语义检索、智能问答、情感分析等高
 
 ### 高性能分布式训练与推理
 
-#### ⚡ FastTokenizer：高性能文本处理库
-
-<div align="center">
-    <img src="https://user-images.githubusercontent.com/11793384/168407921-b4395b1d-44bd-41a0-8c58-923ba2b703ef.png" width="400">
-</div>
-
-```python
-AutoTokenizer.from_pretrained("ernie-3.0-medium-zh", use_fast=True)
-```
-
-为了实现更极致的模型部署性能，安装FastTokenizer后只需在`AutoTokenizer` API上打开 `use_fast=True`选项，即可调用C++实现的高性能分词算子，轻松获得超Python百余倍的文本处理加速，更多使用说明可参考[FastTokenizer文档](./fast_tokenizer)。
-
 #### ⚡️ FastGeneration：高性能生成加速库
 
 <div align="center">

diff --git a/README_en.md b/README_en.md
@@ -224,18 +224,6 @@ For more details please refer to [Speech Command Analysis](./applications/speech
 
 ### High Performance Distributed Training and Inference
 
-#### ⚡ FastTokenizer: High Performance Text Preprocessing Library
-
-<div align="center">
-    <img src="https://user-images.githubusercontent.com/11793384/168407921-b4395b1d-44bd-41a0-8c58-923ba2b703ef.png" width="400">
-</div>
-
-```python
-AutoTokenizer.from_pretrained("ernie-3.0-medium-zh", use_fast=True)
-```
-
-Set `use_fast=True` to use C++ Tokenizer kernel to achieve 100x faster on text pre-processing. For more usage please refer to [FastTokenizer](./fast_tokenizer).
-
 #### ⚡ FastGeneration: High Performance Generation Library
 
 <div align="center">

diff --git a/applications/document_intelligence/README.md b/applications/document_intelligence/README.md
diff --git a/applications/document_intelligence/doc_vqa/.gitignore b/applications/document_intelligence/doc_vqa/.gitignore
diff --git a/applications/document_intelligence/doc_vqa/Extraction/change_to_mrc.py b/applications/document_intelligence/doc_vqa/Extraction/change_to_mrc.py