Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPT Neo #10848

Merged
merged 46 commits into from
Mar 30, 2021
Merged

GPT Neo #10848

Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
f015a91
lets begin
patil-suraj Mar 22, 2021
36f9c94
boom boom
patil-suraj Mar 24, 2021
8255753
fix out proj in attn
patil-suraj Mar 24, 2021
eb6c00f
fix attention
patil-suraj Mar 25, 2021
0c135f9
fix local attention
patil-suraj Mar 25, 2021
36c827d
add tokenizer
patil-suraj Mar 25, 2021
bcee6c7
fix imports
patil-suraj Mar 25, 2021
39954ff
autotokenizer
patil-suraj Mar 25, 2021
b302780
fix checkpoint name
patil-suraj Mar 25, 2021
efa6003
cleanup
patil-suraj Mar 25, 2021
d970255
more clean-up
patil-suraj Mar 26, 2021
dac3f89
more cleanup
patil-suraj Mar 26, 2021
30cf9ca
output attentions
patil-suraj Mar 26, 2021
ca4bad5
fix attn mask creation
patil-suraj Mar 26, 2021
5685de9
fix imports
patil-suraj Mar 26, 2021
784d1cd
config doc
patil-suraj Mar 26, 2021
a474df5
add tests
patil-suraj Mar 26, 2021
7c90f3b
add slow tests
patil-suraj Mar 26, 2021
a5d1161
quality
patil-suraj Mar 26, 2021
647aec4
add conversion script
patil-suraj Mar 26, 2021
4fc464a
copyright
patil-suraj Mar 26, 2021
8781740
typo
patil-suraj Mar 26, 2021
eecbeea
another bites the dust
patil-suraj Mar 26, 2021
f5ca1b9
fix attention tests
patil-suraj Mar 26, 2021
22c9441
doc
patil-suraj Mar 26, 2021
2683d8f
add embed init in convert function
patil-suraj Mar 26, 2021
6b9aef4
fix copies
patil-suraj Mar 26, 2021
8be570a
remove tokenizer
patil-suraj Mar 28, 2021
0a44cbb
enable caching
patil-suraj Mar 28, 2021
bae1b69
address review comments
patil-suraj Mar 29, 2021
c859513
improve config and create attn layer list internally
patil-suraj Mar 29, 2021
7336c6f
more consistent naming
patil-suraj Mar 29, 2021
0d8d2bc
init hf config from mesh-tf config json file
patil-suraj Mar 29, 2021
1eb0bfe
remove neo tokenizer from doc
patil-suraj Mar 29, 2021
23849f7
handle attention_mask in local attn layer
patil-suraj Mar 29, 2021
c46278f
attn_layers => attention_layers
patil-suraj Mar 29, 2021
a59f111
add tokenizer_class in config
patil-suraj Mar 29, 2021
cbb81f9
fix docstring
patil-suraj Mar 29, 2021
08988ab
raise if len of attention_layers is not same as num_layers
patil-suraj Mar 29, 2021
6869ee7
remove tokenizer_class from config
patil-suraj Mar 29, 2021
29663ab
more consistent naming
patil-suraj Mar 29, 2021
e80fc91
fix doc
patil-suraj Mar 29, 2021
7bb186b
fix checkpoint names
patil-suraj Mar 29, 2021
22150cc
fp16 compat
patil-suraj Mar 30, 2021
83c07a0
Merge branch 'master' into gpt-neo
patil-suraj Mar 30, 2021
33c9ada
Apply suggestions from code review
LysandreJik Mar 30, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
lets begin
  • Loading branch information
patil-suraj committed Mar 29, 2021
commit f015a91c4d8a99fca5fa01ad7b200c3271a4474d
99 changes: 99 additions & 0 deletions docs/source/model_doc/gpt_neo.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

GPTNeo
-----------------------------------------------------------------------------------------------------------------------

Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The GPTNeo model was proposed in `<INSERT PAPER NAME HERE>
<<INSERT PAPER LINK HERE>>`__ by <INSERT AUTHORS HERE>. <INSERT SHORT SUMMARY HERE>

The abstract from the paper is the following:

*<INSERT PAPER ABSTRACT HERE>*

Tips:

<INSERT TIPS ABOUT MODEL HERE>

GPTNeoConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.GPTNeoConfig
:members:


GPTNeoTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.GPTNeoTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary


GPTNeoTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.GPTNeoTokenizerFast
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary


GPTNeoModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.GPTNeoModel
:members: forward


GPTNeoForCausalLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.GPTNeoForCausalLM
:members: forward


GPTNeoForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.GPTNeoForMaskedLM
:members: forward


GPTNeoForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.GPTNeoForSequenceClassification
:members: forward


GPTNeoForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.GPTNeoForMultipleChoice
:members: forward


GPTNeoForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.GPTNeoForTokenClassification
:members: forward


GPTNeoForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.GPTNeoForQuestionAnswering
:members: forward
34 changes: 34 additions & 0 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,7 @@
"load_tf2_weights_in_pytorch_model",
],
# Models
"models.gpt_neo": ["GPT_NEO_PRETRAINED_CONFIG_ARCHIVE_MAP", "GPTNeoConfig", "GPTNeoTokenizer"],
"models": [],
"models.albert": ["ALBERT_PRETRAINED_CONFIG_ARCHIVE_MAP", "AlbertConfig"],
"models.auto": [
Expand Down Expand Up @@ -300,6 +301,7 @@
# tokenziers-backed objects
if is_tokenizers_available():
# Fast tokenizers
_import_structure["models.gpt_neo"].append("GPTNeoTokenizerFast")
_import_structure["models.convbert"].append("ConvBertTokenizerFast")
_import_structure["models.albert"].append("AlbertTokenizerFast")
_import_structure["models.bart"].append("BartTokenizerFast")
Expand Down Expand Up @@ -406,6 +408,22 @@
_import_structure["generation_utils"] = ["top_k_top_p_filtering"]
_import_structure["modeling_utils"] = ["Conv1D", "PreTrainedModel", "apply_chunking_to_forward", "prune_layer"]
# PyTorch models structure

_import_structure["models.gpt_neo"].extend(
[
"GPT_NEO_PRETRAINED_MODEL_ARCHIVE_LIST",
"GPTNeoForMaskedLM",
"GPTNeoForCausalLM",
"GPTNeoForMultipleChoice",
"GPTNeoForQuestionAnswering",
"GPTNeoForSequenceClassification",
"GPTNeoForTokenClassification",
"GPTNeoLayer",
"GPTNeoModel",
"GPTNeoPreTrainedModel",
"load_tf_weights_in_gpt_neo",
]
)
_import_structure["models.albert"].extend(
[
"ALBERT_PRETRAINED_MODEL_ARCHIVE_LIST",
Expand Down Expand Up @@ -1357,6 +1375,7 @@
load_tf2_weights_in_pytorch_model,
)
from .models.albert import ALBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, AlbertConfig
from .models.gpt_neo import GPT_NEO_PRETRAINED_CONFIG_ARCHIVE_MAP, GPTNeoConfig, GPTNeoTokenizer
from .models.auto import (
ALL_PRETRAINED_CONFIG_ARCHIVE_MAP,
CONFIG_MAPPING,
Expand Down Expand Up @@ -1521,6 +1540,7 @@
from .utils.dummy_sentencepiece_objects import *

if is_tokenizers_available():
from .models.gpt_neo import GPTNeoTokenizerFast
from .models.albert import AlbertTokenizerFast
from .models.bart import BartTokenizerFast
from .models.barthez import BarthezTokenizerFast
Expand Down Expand Up @@ -1565,6 +1585,20 @@
# Modeling
if is_torch_available():

from .models.gpt_neo import (
GPT_NEO_PRETRAINED_MODEL_ARCHIVE_LIST,
GPTNeoForMaskedLM,
GPTNeoForCausalLM,
GPTNeoForMultipleChoice,
GPTNeoForQuestionAnswering,
GPTNeoForSequenceClassification,
GPTNeoForTokenClassification,
GPTNeoLayer,
GPTNeoModel,
GPTNeoPreTrainedModel,
load_tf_weights_in_gpt_neo,
)

# Benchmarks
from .benchmark.benchmark import PyTorchBenchmark
from .benchmark.benchmark_args import PyTorchBenchmarkArguments
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
# limitations under the License.

from . import (
gpt_neo,
albert,
auto,
bart,
Expand Down
4 changes: 4 additions & 0 deletions src/transformers/models/auto/configuration_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@

from ...configuration_utils import PretrainedConfig
from ..albert.configuration_albert import ALBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, AlbertConfig
from ..gpt_neo.configuration_gpt_neo import GPT_NEO_PRETRAINED_CONFIG_ARCHIVE_MAP, GPTNeoConfig
from ..bart.configuration_bart import BART_PRETRAINED_CONFIG_ARCHIVE_MAP, BartConfig
from ..bert.configuration_bert import BERT_PRETRAINED_CONFIG_ARCHIVE_MAP, BertConfig
from ..bert_generation.configuration_bert_generation import BertGenerationConfig
Expand Down Expand Up @@ -80,6 +81,7 @@
(key, value)
for pretrained_map in [
# Add archive maps here
GPT_NEO_PRETRAINED_CONFIG_ARCHIVE_MAP,
SPEECH_TO_TEXT_PRETRAINED_CONFIG_ARCHIVE_MAP,
WAV_2_VEC_2_PRETRAINED_CONFIG_ARCHIVE_MAP,
M2M_100_PRETRAINED_CONFIG_ARCHIVE_MAP,
Expand Down Expand Up @@ -127,6 +129,7 @@
CONFIG_MAPPING = OrderedDict(
[
# Add configs here
("gpt_neo", GPTNeoConfig),
("speech_to_text", Speech2TextConfig),
("wav2vec2", Wav2Vec2Config),
("m2m_100", M2M100Config),
Expand Down Expand Up @@ -180,6 +183,7 @@
MODEL_NAMES_MAPPING = OrderedDict(
[
# Add full (and cased) model names here
("gpt_neo", "GPTNeo"),
patil-suraj marked this conversation as resolved.
Show resolved Hide resolved
("speech_to_text", "Speech2Text"),
("wav2vec2", "Wav2Vec2"),
("m2m_100", "M2M100"),
Expand Down
18 changes: 18 additions & 0 deletions src/transformers/models/auto/modeling_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,15 @@
from ...utils import logging

# Add modeling imports here
from ..gpt_neo.modeling_gpt_neo import (
GPTNeoForMaskedLM,
GPTNeoForCausalLM,
GPTNeoForMultipleChoice,
GPTNeoForQuestionAnswering,
GPTNeoForSequenceClassification,
GPTNeoForTokenClassification,
GPTNeoModel,
)
from ..albert.modeling_albert import (
AlbertForMaskedLM,
AlbertForMultipleChoice,
Expand Down Expand Up @@ -258,6 +267,7 @@
XLNetModel,
)
from .configuration_auto import (
GPTNeoConfig,
AlbertConfig,
AutoConfig,
BartConfig,
Expand Down Expand Up @@ -315,6 +325,7 @@
MODEL_MAPPING = OrderedDict(
[
# Base model mapping
(GPTNeoConfig, GPTNeoModel),
(Speech2TextConfig, Speech2TextModel),
(Wav2Vec2Config, Wav2Vec2Model),
(M2M100Config, M2M100Model),
Expand Down Expand Up @@ -402,6 +413,7 @@
MODEL_WITH_LM_HEAD_MAPPING = OrderedDict(
[
# Model with LM heads mapping
(GPTNeoConfig, GPTNeoForMaskedLM),
(Speech2TextConfig, Speech2TextForConditionalGeneration),
(Wav2Vec2Config, Wav2Vec2ForMaskedLM),
(M2M100Config, M2M100ForConditionalGeneration),
Expand Down Expand Up @@ -444,6 +456,7 @@
MODEL_FOR_CAUSAL_LM_MAPPING = OrderedDict(
[
# Model for Causal LM mapping
(GPTNeoConfig, GPTNeoForCausalLM),
(CamembertConfig, CamembertForCausalLM),
(XLMRobertaConfig, XLMRobertaForCausalLM),
(RobertaConfig, RobertaForCausalLM),
Expand Down Expand Up @@ -473,6 +486,7 @@
MODEL_FOR_MASKED_LM_MAPPING = OrderedDict(
[
# Model for Masked LM mapping
(GPTNeoConfig, GPTNeoForMaskedLM),
(Wav2Vec2Config, Wav2Vec2ForMaskedLM),
(ConvBertConfig, ConvBertForMaskedLM),
(LayoutLMConfig, LayoutLMForMaskedLM),
Expand Down Expand Up @@ -523,6 +537,7 @@
MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING = OrderedDict(
[
# Model for Sequence Classification mapping
(GPTNeoConfig, GPTNeoForSequenceClassification),
(ConvBertConfig, ConvBertForSequenceClassification),
(LEDConfig, LEDForSequenceClassification),
(DistilBertConfig, DistilBertForSequenceClassification),
Expand Down Expand Up @@ -558,6 +573,7 @@
MODEL_FOR_QUESTION_ANSWERING_MAPPING = OrderedDict(
[
# Model for Question Answering mapping
(GPTNeoConfig, GPTNeoForQuestionAnswering),
(ConvBertConfig, ConvBertForQuestionAnswering),
(LEDConfig, LEDForQuestionAnswering),
(DistilBertConfig, DistilBertForQuestionAnswering),
Expand Down Expand Up @@ -595,6 +611,7 @@
MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING = OrderedDict(
[
# Model for Token Classification mapping
(GPTNeoConfig, GPTNeoForTokenClassification),
(ConvBertConfig, ConvBertForTokenClassification),
(LayoutLMConfig, LayoutLMForTokenClassification),
(DistilBertConfig, DistilBertForTokenClassification),
Expand Down Expand Up @@ -622,6 +639,7 @@
MODEL_FOR_MULTIPLE_CHOICE_MAPPING = OrderedDict(
[
# Model for Multiple Choice mapping
(GPTNeoConfig, GPTNeoForMultipleChoice),
(ConvBertConfig, ConvBertForMultipleChoice),
(CamembertConfig, CamembertForMultipleChoice),
(ElectraConfig, ElectraForMultipleChoice),
Expand Down
85 changes: 85 additions & 0 deletions src/transformers/models/gpt_neo/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# flake8: noqa
# There's no way to ignore "F401 '...' imported but unused" warnings in this
# module, but to preserve other warnings. So, don't check this module at all.

# Copyright 2020 The HuggingFace Team. All rights reserved.
LysandreJik marked this conversation as resolved.
Show resolved Hide resolved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import TYPE_CHECKING
from ...file_utils import _BaseLazyModule, is_torch_available, is_tokenizers_available
_import_structure = {
"configuration_gpt_neo": ["GPT_NEO_PRETRAINED_CONFIG_ARCHIVE_MAP", "GPTNeoConfig"],
"tokenization_gpt_neo": ["GPTNeoTokenizer"],
}

if is_tokenizers_available():
_import_structure["tokenization_gpt_neo_fast"] = ["GPTNeoTokenizerFast"]

if is_torch_available():
_import_structure["modeling_gpt_neo"] = [
"GPT_NEO_PRETRAINED_MODEL_ARCHIVE_LIST",
"GPTNeoForMaskedLM",
"GPTNeoForCausalLM",
"GPTNeoForMultipleChoice",
"GPTNeoForQuestionAnswering",
"GPTNeoForSequenceClassification",
"GPTNeoForTokenClassification",
"GPTNeoLayer",
"GPTNeoModel",
"GPTNeoPreTrainedModel",
"load_tf_weights_in_gpt_neo",
]




if TYPE_CHECKING:
from .configuration_gpt_neo import GPT_NEO_PRETRAINED_CONFIG_ARCHIVE_MAP, GPTNeoConfig
from .tokenization_gpt_neo import GPTNeoTokenizer

if is_tokenizers_available():
from .tokenization_gpt_neo_fast import GPTNeoTokenizerFast

if is_torch_available():
from .modeling_gpt_neo import (
GPT_NEO_PRETRAINED_MODEL_ARCHIVE_LIST,
GPTNeoForMaskedLM,
GPTNeoForCausalLM,
GPTNeoForMultipleChoice,
GPTNeoForQuestionAnswering,
GPTNeoForSequenceClassification,
GPTNeoForTokenClassification,
GPTNeoLayer,
GPTNeoModel,
GPTNeoPreTrainedModel,
load_tf_weights_in_gpt_neo,
)


else:
import importlib
import os
import sys

class _LazyModule(_BaseLazyModule):
"""
Module class that surfaces all objects but only performs associated imports when the objects are requested.
"""

__file__ = globals()["__file__"]
__path__ = [os.path.dirname(__file__)]

def _get_module(self, module_name: str):
return importlib.import_module("." + module_name, self.__name__)

sys.modules[__name__] = _LazyModule(__name__, _import_structure)
Loading