Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nvidia megatron models #10911

Merged
merged 43 commits into from
Apr 8, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
9da3853
Add support for NVIDIA Megatron models
jdemouth-nvidia Mar 25, 2021
f943ed0
Add support for NVIDIA Megatron GPT2 and BERT
jdemouth-nvidia Mar 25, 2021
685479d
Update src/transformers/models/megatron_bert/configuration_megatron_b…
jdemouth Mar 29, 2021
e347036
Update src/transformers/models/megatron_bert/configuration_megatron_b…
jdemouth Mar 29, 2021
0af4168
Update src/transformers/models/megatron_bert/configuration_megatron_b…
jdemouth Mar 29, 2021
435c33e
Remove model.half in tests + add "# Copied ..."
jdemouth-nvidia Mar 29, 2021
343f68d
Fix issues
jdemouth-nvidia Mar 31, 2021
6b551fa
Fix Flax/TF tests
jdemouth-nvidia Mar 31, 2021
4236f00
Fix copyright
jdemouth-nvidia Apr 1, 2021
d2c48de
Update src/transformers/models/megatron_bert/configuration_megatron_b…
jdemouth Apr 1, 2021
2f80114
Update src/transformers/models/megatron_bert/configuration_megatron_b…
jdemouth Apr 1, 2021
691466c
Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
jdemouth Apr 1, 2021
35c91b8
Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
jdemouth Apr 1, 2021
b159513
Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
jdemouth Apr 1, 2021
75dbd92
Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
jdemouth Apr 1, 2021
ba47704
Update docs/source/model_doc/megatron_bert.rst
jdemouth Apr 1, 2021
7c69cca
Update docs/source/model_doc/megatron_gpt2.rst
jdemouth Apr 1, 2021
ef5a4dd
Update src/transformers/models/megatron_bert/__init__.py
jdemouth Apr 1, 2021
934bc8d
Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
jdemouth Apr 1, 2021
e3b4c2b
Update src/transformers/models/megatron_gpt2/convert_megatron_gpt2_ch…
jdemouth Apr 1, 2021
f1efe7a
Update src/transformers/models/megatron_gpt2/convert_megatron_gpt2_ch…
jdemouth Apr 1, 2021
30164e9
Update src/transformers/models/megatron_gpt2/convert_megatron_gpt2_ch…
jdemouth Apr 1, 2021
4b4eb7c
Update src/transformers/models/megatron_bert/convert_megatron_bert_ch…
jdemouth Apr 1, 2021
d20e628
Update src/transformers/models/megatron_bert/convert_megatron_bert_ch…
jdemouth Apr 1, 2021
1b02b4e
Update src/transformers/models/megatron_bert/convert_megatron_bert_ch…
jdemouth Apr 1, 2021
5a2b555
Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
jdemouth Apr 1, 2021
19206aa
Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
jdemouth Apr 1, 2021
8c7f61b
Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
jdemouth Apr 1, 2021
1bf4b51
Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
jdemouth Apr 1, 2021
92d461d
Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
jdemouth Apr 1, 2021
93096e7
Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
jdemouth Apr 1, 2021
11072bc
Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
jdemouth Apr 1, 2021
acd1ee8
Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
jdemouth Apr 1, 2021
5f616b7
Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
jdemouth Apr 1, 2021
5e24d73
Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
jdemouth Apr 1, 2021
57ea6d3
Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
jdemouth Apr 1, 2021
74a8205
Resolve most of 'sgugger' comments
jdemouth-nvidia Apr 1, 2021
f155bc4
Fix conversion issue + Run make fix-copies/quality/docs
jdemouth-nvidia Apr 1, 2021
487c5a0
Apply suggestions from code review
LysandreJik Apr 7, 2021
f1d2538
Merge branch 'master' into add-nvidia-megatron-models
LysandreJik Apr 7, 2021
6a4367e
Causal LM & merge
LysandreJik Apr 7, 2021
bae4340
Fix init
LysandreJik Apr 7, 2021
8f7a942
Add CausalLM to last auto class
LysandreJik Apr 8, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Add support for NVIDIA Megatron models
  • Loading branch information
jdemouth-nvidia committed Apr 1, 2021
commit 9da38535321a1629c499e8a4f6b17b4a3cea5d54
129 changes: 129 additions & 0 deletions docs/source/model_doc/megatron_bert.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.

Copyright 2021 NVIDIA Corporation
jdemouth marked this conversation as resolved.
Show resolved Hide resolved

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

MegatronBERT
-----------------------------------------------------------------------------------------------------------------------

Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The MegatronBERT model was proposed in `Megatron-LM: Training Multi-Billion
Parameter Language Models Using Model Parallelism
<https://arxiv.org/abs/1909.08053>`__ by Mohammad Shoeybi, Mostofa Patwary,
Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.

The abstract from the paper is the following:

*Recent work in language modeling demonstrates that training large transformer
models advances the state of the art in Natural Language Processing
applications. However, very large models can be quite difficult to train due to
memory constraints. In this work, we present our techniques for training very
large transformer models and implement a simple, efficient intra-layer model
parallel approach that enables training transformer models with billions of
parameters. Our approach does not require a new compiler or library changes, is
orthogonal and complimentary to pipeline model parallelism, and can be fully
implemented with the insertion of a few communication operations in native
PyTorch. We illustrate this approach by converging transformer based models up
to 8.3 billion parameters using 512 GPUs. We sustain 15.1 PetaFLOPs across the
entire application with 76% scaling efficiency when compared to a strong single
GPU baseline that sustains 39 TeraFLOPs, which is 30% of peak FLOPs. To
demonstrate that large language models can further advance the state of the art
(SOTA), we train an 8.3 billion parameter transformer language model similar to
GPT-2 and a 3.9 billion parameter model similar to BERT. We show that careful
attention to the placement of layer normalization in BERT-like models is
critical to achieving increased performance as the model size grows. Using the
GPT-2 model we achieve SOTA results on the WikiText103 (10.8 compared to SOTA
perplexity of 15.8) and LAMBADA (66.5% compared to SOTA accuracy of 63.2%)
datasets. Our BERT model achieves SOTA results on the RACE dataset (90.9%
compared to SOTA accuracy of 89.4%).*

Tips:

We have provided pretrained `BERT-345M
<https://ngc.nvidia.com/catalog/models/nvidia:megatron_bert_345m>`__ checkpoints
for use to evaluate or finetuning downstream tasks.

To access these checkpoints, first `sign up <https://ngc.nvidia.com/signup>`__
for and setup the NVIDIA GPU Cloud (NGC) Registry CLI. Further documentation
for downloading models can be found in the `NGC documentation
<https://docs.nvidia.com/dgx/ngc-registry-cli-user-guide/index.html#topic_6_4_1>`__.

Alternatively, you can directly download the checkpoints using:

BERT-345M-uncased::

wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/megatron_bert_345m/versions/v0.1_uncased/zip -O megatron_bert_345m_v0_1_uncased.zip

BERT-345M-cased::

wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/megatron_bert_345m/versions/v0.1_cased/zip -O megatron_bert_345m_v0_1_cased.zip

Once you have obtained the checkpoints from NVIDIA GPU Cloud (NGC), you have to
convert them to a format that will easily be loaded by Hugging Face
Transformers and our port of the BERT code.

The following commands allow you to do the conversion. We assume that the
folder ``models/megatron_bert`` contains ``megatron_bert_345m_v0_1_{cased,
uncased}.zip`` and that the commands are run from inside that folder::

python3 $PATH_TO_TRANSFORMERS/models/megatron_bert/convert_megatron_bert_checkpoint.py megatron_bert_345m_v0_1_uncased.zip
python3 $PATH_TO_TRANSFORMERS/models/megatron_bert/convert_megatron_bert_checkpoint.py megatron_bert_345m_v0_1_cased.zip

The original code can be found `here
<https://github.com/NVIDIA/Megatron-LM>`__. That repository
contains a multi-GPU and multi-node implementation of the Megatron Language models. In particular,
it contains a hybrid model parallel approach using "tensor parallel" and "pipeline parallel" techniques.

MegatronBertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.MegatronBertConfig
:members:


MegatronBertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.MegatronBertModel
:members: forward


MegatronBertForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.MegatronBertForConditionalGeneration
:members: forward


MegatronBertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.MegatronBertForSequenceClassification
:members: forward


MegatronBertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.MegatronBertForQuestionAnswering
:members: forward


MegatronBertForCausalLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.MegatronBertForCausalLM
:members: forward


81 changes: 81 additions & 0 deletions docs/source/model_doc/megatron_gpt2.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.

Copyright 2021 NVIDIA Corporation
jdemouth marked this conversation as resolved.
Show resolved Hide resolved

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

MegatronGPT2
-----------------------------------------------------------------------------------------------------------------------

Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The MegatronGPT2 model was proposed in `Megatron-LM: Training Multi-Billion
Parameter Language Models Using Model Parallelism
<https://arxiv.org/abs/1909.08053>`__ by Mohammad Shoeybi, Mostofa Patwary,
Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.

The abstract from the paper is the following:

*Recent work in language modeling demonstrates that training large transformer
models advances the state of the art in Natural Language Processing
applications. However, very large models can be quite difficult to train due to
memory constraints. In this work, we present our techniques for training very
large transformer models and implement a simple, efficient intra-layer model
parallel approach that enables training transformer models with billions of
parameters. Our approach does not require a new compiler or library changes, is
orthogonal and complimentary to pipeline model parallelism, and can be fully
implemented with the insertion of a few communication operations in native
PyTorch. We illustrate this approach by converging transformer based models up
to 8.3 billion parameters using 512 GPUs. We sustain 15.1 PetaFLOPs across the
entire application with 76% scaling efficiency when compared to a strong single
GPU baseline that sustains 39 TeraFLOPs, which is 30% of peak FLOPs. To
demonstrate that large language models can further advance the state of the art
(SOTA), we train an 8.3 billion parameter transformer language model similar to
GPT-2 and a 3.9 billion parameter model similar to BERT. We show that careful
attention to the placement of layer normalization in BERT-like models is
critical to achieving increased performance as the model size grows. Using the
GPT-2 model we achieve SOTA results on the WikiText103 (10.8 compared to SOTA
perplexity of 15.8) and LAMBADA (66.5% compared to SOTA accuracy of 63.2%)
datasets. Our BERT model achieves SOTA results on the RACE dataset (90.9%
compared to SOTA accuracy of 89.4%).*

Tips:

We have provided pretrained `GPT2-345M
<https://ngc.nvidia.com/catalog/models/nvidia:megatron_lm_345m>`__ checkpoints
for use to evaluate or finetuning downstream tasks.

To access these checkpoints, first `sign up <https://ngc.nvidia.com/signup>`__
for and setup the NVIDIA GPU Cloud (NGC) Registry CLI. Further documentation
for downloading models can be found in the `NGC documentation
<https://docs.nvidia.com/dgx/ngc-registry-cli-user-guide/index.html#topic_6_4_1>`__.

Alternatively, you can directly download the checkpoints using::

wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/megatron_lm_345m/versions/v0.0/zip -O megatron_gpt2_345m_v0_0.zip

Once you have obtained the checkpoint from NVIDIA GPU Cloud (NGC), you have to
convert it to a format that will easily be loaded by Hugging Face Transformers
GPT2 implementation.

The following command allows you to do the conversion. We assume that the
folder ``models/megatron_gpt2`` contains ``megatron_gpt2_345m_v0_0.zip`` and
that the command is run from that folder::

python3 $PATH_TO_TRANSFORMERS/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py megatron_gpt2_345m_v0_0.zip

The original code can be found `here
<https://github.com/NVIDIA/Megatron-LM>`__. That repository
contains a multi-GPU and multi-node implementation of the Megatron Language models. In particular,
it contains a hybrid model parallel approach using "tensor parallel" and "pipeline parallel" techniques.


108 changes: 108 additions & 0 deletions examples/megatron-models/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
<!---
# ##############################################################################################
#
# Copyright (c) 2021-, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# ##############################################################################################
-->

# How to run Megatron BERT and GPT2 using Transformers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for providing such a useful README! I'm wondering if it wouldn't be best suited inside a model card on the hub? That's generally how users look for a model, rather than in the examples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds good with me.

Where do you want me to put it? docs/sources/model_doc? Or, model_cards/nvidia/megatron-models?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of a modelcard on the hub directly! It seems you're already in the NVIDIA organization there: https://huggingface.co/nvidia

The idea would be to create a repository there containing only the model card with the contents of the README you've put here. Does that make sense? When looking for models, that's where our users look, so that's the easiest way to enable discoverability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created two model cards. One for BERT and one for GPT2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, and exactly what I had in mind!


## Get the checkpoints from the NVIDIA GPU Cloud

The first step is to create a directory in the current folder (`examples/megatron-lm`) to store the
checkpoints.

```
mkdir -p models/{bert, gpt2}
```

Then, you can download the checkpoints from the NVIDIA GPU Cloud (NGC). For that you have to
[sign up](https://ngc.nvidia.com/signup) for and setup the NVIDIA GPU Cloud (NGC) Registry CLI.
Further documentation for downloading models can be found in the
[NGC documentation](https://docs.nvidia.com/dgx/ngc-registry-cli-user-guide/index.html#topic_6_4_1).


Alternatively, you can directly download the checkpoints using:

### BERT 345M cased

```
wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/megatron_bert_345m/versions/v0.1_cased/zip -O models/bert/megatron_bert_345m_v0_1_cased.zip
```

### BERT 345M uncased

```
wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/megatron_bert_345m/versions/v0.1_uncased/zip -O models/bert/megatron_bert_345m_v0_1_uncased.zip
```

### GPT2 345M

```
wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/megatron_lm_345m/versions/v0.0/zip -O models/gpt2/megatron_gpt2_345m_v0_0.zip
```

## Converting the checkpoints

In order to be loaded into `Transformers`, the checkpoints have to be converted. You should run the following
commands for that purpose.

### BERT 345M cased

```
python3 $PATH_TO_TRANSFORMERS/models/megatron_bert/convert_megatron_bert_checkpoint.py models/bert/megatron_bert_345m_v0_1_cased.zip
```

### BERT 345M uncased

```
python3 $PATH_TO_TRANSFORMERS/models/megatron_bert/convert_megatron_bert_checkpoint.py models/bert/megatron_bert_345m_v0_1_uncased.zip
```

### GPT2 345M

```
python3 $PATH_TO_TRANSFORMERS/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py models/gpt2/megatron_gpt2_345m_v0_0.zip
```

## Running the samples

For BERT, we created a simple example that runs two tasks using the Megatron BERT checkpoints using
the Transformers API. The first task is `MegatronBERTForMaskedLM` and the second one is
`MegatronBERTForNextSentencePrediction`.

### Masked LM

```
python3 ./run_bert.py --masked-lm ./models/bert/megatron_bert_345m_v0_1_cased
python3 ./run_bert.py --masked-lm ./models/bert/megatron_bert_345m_v0_1_uncased
```

### Next sentence prediction

```
python3 ./run_bert.py ./models/bert/megatron_bert_345m_v0_1_cased
python3 ./run_bert.py ./models/bert/megatron_bert_345m_v0_1_uncased
```

### Text generation

For GPT2, we created a simple for text generation.

```
python3 ./run_gpt2.py models/gpt2/megatron_gpt2_345m_v0_0
```

Loading