update the translation of NLP-pretrain part (d2l-ai#1007)

* Update bounding-box.md minor change about the translation * Update anchor.md minor update about translation * Update nadaraya-waston.md Add the missing Chinese char at the very beginning. i.e. "现". * Update bert.md Translation finetuning. * Update subword-embedding.md mini fine tune on translation. * Update bert.md 可学习的片段嵌入 --》根据输入序列学到的片段嵌入 * Update bert-dataset.md error in bert-dateset section Co-authored-by: goldmermaid <goldpiggy@berkeley.edu>
bluewelkin · Nov 11, 2021 · fca3901 · fca3901
1 parent 1599f39
commit fca3901
Show file tree

Hide file tree

Showing 3 changed files with 4 additions and 4 deletions.
diff --git a/chapter_natural-language-processing-pretraining/bert-dataset.md b/chapter_natural-language-processing-pretraining/bert-dataset.md
@@ -64,7 +64,7 @@ def _get_next_sentence(sentence, next_sentence, paragraphs):
     return sentence, next_sentence, is_next
 ```
 
-下面的函数通过调用`paragraph`函数从输入`_get_next_sentence`生成用于下一句预测的训练样本。这里`paragraph`是句子列表，其中每个句子都是词元列表。自变量`max_len`指定预训练期间的BERT输入序列的最大长度。
+下面的函数通过调用`_get_next_sentence`函数从输入`paragraph`生成用于下一句预测的训练样本。这里`paragraph`是句子列表，其中每个句子都是词元列表。自变量`max_len`指定预训练期间的BERT输入序列的最大长度。
 
 ```{.python .input}
 #@tab all

diff --git a/chapter_natural-language-processing-pretraining/bert.md b/chapter_natural-language-processing-pretraining/bert.md
@@ -48,7 +48,7 @@ from torch import nn
 
 在自然语言处理中，有些任务（如情感分析）以单个文本作为输入，而有些任务（如自然语言推断）以一对文本序列作为输入。BERT输入序列明确地表示单个文本和文本对。当输入为单个文本时，BERT输入序列是特殊类别词元“&lt;cls&gt;”、文本序列的标记、以及特殊分隔词元“&lt;sep&gt;”的连结。当输入为文本对时，BERT输入序列是“&lt;cls&gt;”、第一个文本序列的标记、“&lt;sep&gt;”、第二个文本序列标记、以及“&lt;sep&gt;”的连结。我们将始终如一地将术语“BERT输入序列”与其他类型的“序列”区分开来。例如，一个*BERT输入序列*可以包括一个*文本序列*或两个*文本序列*。
 
-为了区分文本对，可学习的片段嵌入$\mathbf{e}_A$和$\mathbf{e}_B$分别被添加到第一序列和第二序列的词元嵌入中。对于单文本输入，仅使用$\mathbf{e}_A$。
+为了区分文本对，根据输入序列学到的片段嵌入$\mathbf{e}_A$和$\mathbf{e}_B$分别被添加到第一序列和第二序列的词元嵌入中。对于单文本输入，仅使用$\mathbf{e}_A$。
 
 下面的`get_tokens_and_segments`将一个句子或两个句子作为输入，然后返回BERT输入序列的标记及其相应的片段索引。
 
@@ -255,7 +255,7 @@ mlm_Y_hat = mlm(encoded_X, mlm_positions)
 mlm_Y_hat.shape
 ```
 
-通过掩码下的预测词元`mlm_Y`的真实值`mlm_Y_hat`，我们可以计算在BERT预训练中的掩蔽语言模型任务的交叉熵损失。
+通过掩码下的预测词元`mlm_Y`的真实标签`mlm_Y_hat`，我们可以计算在BERT预训练中的遮蔽语言模型任务的交叉熵损失。
 
 ```{.python .input}
 mlm_Y = np.array([[7, 8, 9], [10, 20, 30]])

diff --git a/chapter_natural-language-processing-pretraining/subword-embedding.md b/chapter_natural-language-processing-pretraining/subword-embedding.md
@@ -7,7 +7,7 @@
 
 回想一下词在word2vec中是如何表示的。在跳元模型和连续词袋模型中，同一词的不同变形形式直接由不同的向量表示，不需要共享参数。为了使用形态信息，*fastText*模型提出了一种*子词嵌入*方法，其中子词是一个字符$n$-gram :cite:`Bojanowski.Grave.Joulin.ea.2017`。fastText可以被认为是子词级跳元模型，而非学习词级向量表示，其中每个*中心词*由其子词级向量之和表示。
 
-让我们来说明如何使用单“where”获得fastText中每个中心词的子词。首先，在词的开头和末尾添加特殊字符“&lt;”和“&gt;”，以将前缀和后缀与其他子词区分开来。然后，从词中提取字符$n$-gram。例如，值$n=3$时，我们将获得长度为3的所有子词：“&lt;wh”、“whe”、“her”、“ere”、“re&gt;”和特殊子词“&lt;where&gt;”。
+让我们来说明如何以单词“where”为例获得fastText中每个中心词的子词。首先，在词的开头和末尾添加特殊字符“&lt;”和“&gt;”，以将前缀和后缀与其他子词区分开来。然后，从词中提取字符$n$-gram。例如，值$n=3$时，我们将获得长度为3的所有子词：“&lt;wh”、“whe”、“her”、“ere”、“re&gt;”和特殊子词“&lt;where&gt;”。
 
 在fastText中，对于任意词$w$，用$\mathcal{G}_w$表示其长度在3和6之间的所有子词与其特殊子词的并集。词表是所有词的子词的集合。假设$\mathbf{z}_g$是词典中的子词$g$的向量，则跳元模型中作为中心词的词$w$的向量$\mathbf{v}_w$是其子词向量的和：