Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigBird #10183

Merged
merged 88 commits into from
Mar 30, 2021
Merged

BigBird #10183

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
88 commits
Select commit Hold shift + click to select a range
d0aa9ea
init bigbird
thevasudevgupta Feb 14, 2021
0e37183
model.__init__ working, conversion script ready, config updated
thevasudevgupta Feb 15, 2021
faa4ab3
add conversion script
thevasudevgupta Feb 16, 2021
facd93e
BigBirdEmbeddings working :)
thevasudevgupta Feb 16, 2021
28c8d13
slightly update conversion script
patrickvonplaten Feb 16, 2021
124b99f
BigBirdAttention working :) ; some bug in layer.output.dense
thevasudevgupta Feb 17, 2021
2c14788
add debugger-notebook
thevasudevgupta Feb 17, 2021
12a523b
forward() working for BigBirdModel :) ; replaced gelu with gelu_fast
thevasudevgupta Feb 17, 2021
aebd36b
tf code adapted to torch till rand_attn in bigbird_block_sparse_atten…
thevasudevgupta Feb 19, 2021
9df3127
BigBirdModel working in block-sparse attention mode :)
thevasudevgupta Feb 23, 2021
ad84acf
add BigBirdForPreTraining
thevasudevgupta Feb 24, 2021
4076c9b
small fix
thevasudevgupta Feb 24, 2021
78a205a
add tokenizer for BigBirdModel
thevasudevgupta Feb 25, 2021
644f65d
fix config & hence modeling
thevasudevgupta Feb 25, 2021
ce66bac
fix base prefix
thevasudevgupta Feb 25, 2021
f672205
init testing
thevasudevgupta Feb 25, 2021
372ff99
init tokenizer test
thevasudevgupta Feb 26, 2021
ed6dc49
pos_embed must be absolute, attn_type=original_full when add_cross_at…
thevasudevgupta Feb 26, 2021
7e05539
remove position_embedding_type arg
thevasudevgupta Feb 26, 2021
d257079
complete normal tests
thevasudevgupta Feb 26, 2021
07ec9a1
add comments to block sparse attention
thevasudevgupta Feb 27, 2021
01dd2e8
add attn_probs for sliding & global tokens
thevasudevgupta Feb 27, 2021
49d62e5
create fn for block sparse attn mask creation
thevasudevgupta Feb 28, 2021
5912716
add special tests
thevasudevgupta Feb 28, 2021
89de3c5
restore pos embed arg
thevasudevgupta Feb 28, 2021
b132905
minor fix
thevasudevgupta Feb 28, 2021
6ab2921
attn probs update
thevasudevgupta Mar 1, 2021
72e2532
make big bird fully gpu friendly
patrickvonplaten Mar 2, 2021
7401768
fix tests
patrickvonplaten Mar 2, 2021
da2824f
remove pruning
patrickvonplaten Mar 2, 2021
3a866e2
correct tokenzier & minor fixes
patrickvonplaten Mar 2, 2021
753ba75
update conversion script , remove norm_type
thevasudevgupta Mar 2, 2021
1e186d0
tokenizer-inference test add
thevasudevgupta Mar 2, 2021
72a150e
remove extra comments
thevasudevgupta Mar 2, 2021
24c74a9
add docs
thevasudevgupta Mar 3, 2021
79955e4
save intermediate
patrickvonplaten Mar 3, 2021
018b8fd
finish trivia_qa conversion
patrickvonplaten Mar 4, 2021
1716dea
small update to forward
thevasudevgupta Mar 4, 2021
15b7cfa
correct qa and layer
patrickvonplaten Mar 4, 2021
c300f3f
merge into master
patrickvonplaten Mar 4, 2021
2782295
better error message
patrickvonplaten Mar 4, 2021
56bd1d8
BigBird QA ready
thevasudevgupta Mar 5, 2021
ecfe137
fix rebased
thevasudevgupta Mar 5, 2021
eebd92a
add triva-qa debugger notebook
thevasudevgupta Mar 5, 2021
f6b6f43
qa setup
thevasudevgupta Mar 6, 2021
a50a10c
fixed till embeddings
thevasudevgupta Mar 7, 2021
edf5f2a
some issue in q/k/v_layer
thevasudevgupta Mar 8, 2021
a94d006
fix bug in conversion-script
thevasudevgupta Mar 9, 2021
3b489a3
fixed till self-attn
thevasudevgupta Mar 9, 2021
1e3aa50
qa fixed except layer norm
thevasudevgupta Mar 11, 2021
2f59e51
add qa end2end test
thevasudevgupta Mar 12, 2021
ef72bcd
fix gradient ckpting ; other qa test
thevasudevgupta Mar 12, 2021
8b94584
speed-up big bird a bit
patrickvonplaten Mar 15, 2021
468de78
hub_id=google
thevasudevgupta Mar 12, 2021
58ee280
clean up
thevasudevgupta Mar 15, 2021
e873658
make quality
thevasudevgupta Mar 15, 2021
4e13753
speed up einsum with bmm
patrickvonplaten Mar 16, 2021
e88110a
finish perf improvements for big bird
patrickvonplaten Mar 16, 2021
5f2d6a0
Merge branch 'master' into add_big_bird
patrickvonplaten Mar 16, 2021
cada132
Merge branch 'master' into add_big_bird
patrickvonplaten Mar 22, 2021
b8f41c0
remove wav2vec2 tok
patrickvonplaten Mar 22, 2021
22a71cc
fix tokenizer
patrickvonplaten Mar 22, 2021
5730a98
include docs
patrickvonplaten Mar 22, 2021
ab65872
correct docs
patrickvonplaten Mar 22, 2021
ff32248
add helper to auto pad block size
thevasudevgupta Mar 25, 2021
de2f812
make style
thevasudevgupta Mar 25, 2021
1b0e5f1
remove fast tokenizer for now
patrickvonplaten Mar 25, 2021
1ff2ff0
fix some
thevasudevgupta Mar 25, 2021
87a4e8c
add pad test
thevasudevgupta Mar 25, 2021
b20906c
finish
patrickvonplaten Mar 28, 2021
00cd6fb
fix some bugs
patrickvonplaten Mar 28, 2021
a719f1f
fix another bug
patrickvonplaten Mar 28, 2021
66fbec6
fix buffer tokens
thevasudevgupta Mar 29, 2021
184d361
:wqalMerge branch 'master' of https://github.com/huggingface/transfor…
patrickvonplaten Mar 29, 2021
1af7c98
Merge branch 'add_big_bird' of https://github.com/vasudevgupta7/trans…
patrickvonplaten Mar 29, 2021
aca2b4b
fix comment and merge from master
patrickvonplaten Mar 29, 2021
ef673bb
add comments
thevasudevgupta Mar 29, 2021
a6018bf
make style
patrickvonplaten Mar 29, 2021
58ef450
Merge branch 'master' of https://github.com/huggingface/transformers …
patrickvonplaten Mar 29, 2021
8a47841
commit some suggestions
thevasudevgupta Mar 29, 2021
dbc6e39
Fix typos
sgugger Mar 29, 2021
25164b9
fix some more suggestions
thevasudevgupta Mar 29, 2021
7bbbd6b
add another patch
thevasudevgupta Mar 29, 2021
ab6755e
fix copies
thevasudevgupta Mar 29, 2021
a9779b2
another path
thevasudevgupta Mar 29, 2021
df70258
update
thevasudevgupta Mar 29, 2021
0f110c5
update nit suggestions
thevasudevgupta Mar 29, 2021
8332604
make style
patrickvonplaten Mar 30, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,7 @@ Current number of checkpoints: ![](https://img.shields.io/endpoint?url=https://h
1. **[BARThez](https://huggingface.co/transformers/model_doc/barthez.html)** (from École polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis.
1. **[BERT](https://huggingface.co/transformers/model_doc/bert.html)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
1. **[BERT For Sequence Generation](https://huggingface.co/transformers/model_doc/bertgeneration.html)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[BigBird-RoBERTa](https://huggingface.co/transformers/model_doc/bigbird.html)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[Blenderbot](https://huggingface.co/transformers/model_doc/blenderbot.html)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BlenderbotSmall](https://huggingface.co/transformers/model_doc/blenderbot_small.html)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BORT](https://huggingface.co/transformers/model_doc/bort.html)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry.
Expand Down
94 changes: 50 additions & 44 deletions docs/source/index.rst

Large diffs are not rendered by default.

128 changes: 128 additions & 0 deletions docs/source/model_doc/bigbird.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
..
Copyright 2021 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

BigBird
-----------------------------------------------------------------------------------------------------------------------

Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The BigBird model was proposed in `Big Bird: Transformers for Longer Sequences <https://arxiv.org/abs/2007.14062>`__ by
Zaheer, Manzil and Guruganesh, Guru and Dubey, Kumar Avinava and Ainslie, Joshua and Alberti, Chris and Ontanon,
Santiago and Pham, Philip and Ravula, Anirudh and Wang, Qifan and Yang, Li and others. BigBird, is a sparse-attention
based transformer which extends Transformer based models, such as BERT to much longer sequences. In addition to sparse
attention, BigBird also applies global attention as well as random attention to the input sequence. Theoretically, it
has been shown that applying sparse, global, and random attention approximates full attention, while being
computationally much more efficient for longer sequences. As a consequence of the capability to handle longer context,
BigBird has shown improved performance on various long document NLP tasks, such as question answering and
summarization, compared to BERT or RoBERTa.

The abstract from the paper is the following:

*Transformers-based models, such as BERT, have been one of the most successful deep learning models for NLP.
Unfortunately, one of their core limitations is the quadratic dependency (mainly in terms of memory) on the sequence
length due to their full attention mechanism. To remedy this, we propose, BigBird, a sparse attention mechanism that
reduces this quadratic dependency to linear. We show that BigBird is a universal approximator of sequence functions and
is Turing complete, thereby preserving these properties of the quadratic, full attention model. Along the way, our
theoretical analysis reveals some of the benefits of having O(1) global tokens (such as CLS), that attend to the entire
sequence as part of the sparse attention mechanism. The proposed sparse attention can handle sequences of length up to
8x of what was previously possible using similar hardware. As a consequence of the capability to handle longer context,
BigBird drastically improves performance on various NLP tasks such as question answering and summarization. We also
propose novel applications to genomics data.*

Tips:

- BigBird comes with 2 implementations: **original_full** & **block_sparse**. For the sequence length < 1024, using
**original_full** is advised as there is no benefit in using **block_sparse** attention.
- The code currently uses window size of 3 blocks and 2 global blocks.
- Sequence length must be divisible by block size.
- Current implementation supports only **ITC**.
- Current implementation doesn't support **num_random_blocks = 0**

The original code can be found `here <https://github.com/google-research/bigbird>`__.

BigBirdConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BigBirdConfig
:members:


BigBirdTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BigBirdTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary


BigBird specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.models.big_bird.modeling_big_bird.BigBirdForPreTrainingOutput
:members:


BigBirdModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BigBirdModel
:members: forward


BigBirdForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BigBirdForPreTraining
:members: forward


BigBirdForCausalLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BigBirdForCausalLM
:members: forward


BigBirdForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BigBirdForMaskedLM
:members: forward


BigBirdForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BigBirdForSequenceClassification
:members: forward


BigBirdForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BigBirdForMultipleChoice
:members: forward


BigBirdForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BigBirdForTokenClassification
:members: forward


BigBirdForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BigBirdForQuestionAnswering
:members: forward
32 changes: 32 additions & 0 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,7 @@
"models.bert_generation": ["BertGenerationConfig"],
"models.bert_japanese": ["BertJapaneseTokenizer", "CharacterTokenizer", "MecabTokenizer"],
"models.bertweet": ["BertweetTokenizer"],
"models.big_bird": ["BIG_BIRD_PRETRAINED_CONFIG_ARCHIVE_MAP", "BigBirdConfig", "BigBirdTokenizer"],
"models.blenderbot": ["BLENDERBOT_PRETRAINED_CONFIG_ARCHIVE_MAP", "BlenderbotConfig", "BlenderbotTokenizer"],
"models.blenderbot_small": [
"BLENDERBOT_SMALL_PRETRAINED_CONFIG_ARCHIVE_MAP",
Expand Down Expand Up @@ -484,6 +485,22 @@
"load_tf_weights_in_bert_generation",
]
)
_import_structure["models.big_bird"].extend(
[
"BIG_BIRD_PRETRAINED_MODEL_ARCHIVE_LIST",
"BigBirdForCausalLM",
"BigBirdForMaskedLM",
"BigBirdForMultipleChoice",
"BigBirdForPreTraining",
"BigBirdForQuestionAnswering",
"BigBirdForSequenceClassification",
"BigBirdForTokenClassification",
"BigBirdLayer",
"BigBirdModel",
"BigBirdPreTrainedModel",
"load_tf_weights_in_big_bird",
]
)
_import_structure["models.blenderbot"].extend(
[
"BLENDERBOT_PRETRAINED_MODEL_ARCHIVE_LIST",
Expand Down Expand Up @@ -1376,6 +1393,7 @@
from .models.bert_generation import BertGenerationConfig
from .models.bert_japanese import BertJapaneseTokenizer, CharacterTokenizer, MecabTokenizer
from .models.bertweet import BertweetTokenizer
from .models.big_bird import BIG_BIRD_PRETRAINED_CONFIG_ARCHIVE_MAP, BigBirdConfig, BigBirdTokenizer
from .models.blenderbot import BLENDERBOT_PRETRAINED_CONFIG_ARCHIVE_MAP, BlenderbotConfig, BlenderbotTokenizer
from .models.blenderbot_small import (
BLENDERBOT_SMALL_PRETRAINED_CONFIG_ARCHIVE_MAP,
Expand Down Expand Up @@ -1678,6 +1696,20 @@
BertGenerationEncoder,
load_tf_weights_in_bert_generation,
)
from .models.big_bird import (
BIG_BIRD_PRETRAINED_MODEL_ARCHIVE_LIST,
BigBirdForCausalLM,
BigBirdForMaskedLM,
BigBirdForMultipleChoice,
BigBirdForPreTraining,
BigBirdForQuestionAnswering,
BigBirdForSequenceClassification,
BigBirdForTokenClassification,
BigBirdLayer,
BigBirdModel,
BigBirdPreTrainedModel,
load_tf_weights_in_big_bird,
)
from .models.blenderbot import (
BLENDERBOT_PRETRAINED_MODEL_ARCHIVE_LIST,
BlenderbotForCausalLM,
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
bert_generation,
bert_japanese,
bertweet,
big_bird,
blenderbot,
blenderbot_small,
camembert,
Expand Down
4 changes: 4 additions & 0 deletions src/transformers/models/auto/configuration_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
from ..bart.configuration_bart import BART_PRETRAINED_CONFIG_ARCHIVE_MAP, BartConfig
from ..bert.configuration_bert import BERT_PRETRAINED_CONFIG_ARCHIVE_MAP, BertConfig
from ..bert_generation.configuration_bert_generation import BertGenerationConfig
from ..big_bird.configuration_big_bird import BIG_BIRD_PRETRAINED_CONFIG_ARCHIVE_MAP, BigBirdConfig
from ..blenderbot.configuration_blenderbot import BLENDERBOT_PRETRAINED_CONFIG_ARCHIVE_MAP, BlenderbotConfig
from ..blenderbot_small.configuration_blenderbot_small import (
BLENDERBOT_SMALL_PRETRAINED_CONFIG_ARCHIVE_MAP,
Expand Down Expand Up @@ -80,6 +81,7 @@
(key, value)
for pretrained_map in [
# Add archive maps here
BIG_BIRD_PRETRAINED_CONFIG_ARCHIVE_MAP,
SPEECH_TO_TEXT_PRETRAINED_CONFIG_ARCHIVE_MAP,
WAV_2_VEC_2_PRETRAINED_CONFIG_ARCHIVE_MAP,
M2M_100_PRETRAINED_CONFIG_ARCHIVE_MAP,
Expand Down Expand Up @@ -127,6 +129,7 @@
CONFIG_MAPPING = OrderedDict(
[
# Add configs here
("big_bird", BigBirdConfig),
("speech_to_text", Speech2TextConfig),
("wav2vec2", Wav2Vec2Config),
("m2m_100", M2M100Config),
Expand Down Expand Up @@ -180,6 +183,7 @@
MODEL_NAMES_MAPPING = OrderedDict(
[
# Add full (and cased) model names here
("big_bird", "BigBird"),
("speech_to_text", "Speech2Text"),
("wav2vec2", "Wav2Vec2"),
("m2m_100", "M2M100"),
Expand Down
20 changes: 20 additions & 0 deletions src/transformers/models/auto/modeling_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,16 @@
BertModel,
)
from ..bert_generation.modeling_bert_generation import BertGenerationDecoder, BertGenerationEncoder
from ..big_bird.modeling_big_bird import (
BigBirdForCausalLM,
BigBirdForMaskedLM,
BigBirdForMultipleChoice,
BigBirdForPreTraining,
BigBirdForQuestionAnswering,
BigBirdForSequenceClassification,
BigBirdForTokenClassification,
BigBirdModel,
)
from ..blenderbot.modeling_blenderbot import BlenderbotForCausalLM, BlenderbotForConditionalGeneration, BlenderbotModel
from ..blenderbot_small.modeling_blenderbot_small import (
BlenderbotSmallForCausalLM,
Expand Down Expand Up @@ -263,6 +273,7 @@
BartConfig,
BertConfig,
BertGenerationConfig,
BigBirdConfig,
BlenderbotConfig,
BlenderbotSmallConfig,
CamembertConfig,
Expand Down Expand Up @@ -315,6 +326,7 @@
MODEL_MAPPING = OrderedDict(
[
# Base model mapping
(BigBirdConfig, BigBirdModel),
(Speech2TextConfig, Speech2TextModel),
(Wav2Vec2Config, Wav2Vec2Model),
(M2M100Config, M2M100Model),
Expand Down Expand Up @@ -380,6 +392,7 @@
(RobertaConfig, RobertaForMaskedLM),
(SqueezeBertConfig, SqueezeBertForMaskedLM),
(BertConfig, BertForPreTraining),
(BigBirdConfig, BigBirdForPreTraining),
(OpenAIGPTConfig, OpenAIGPTLMHeadModel),
(GPT2Config, GPT2LMHeadModel),
(MobileBertConfig, MobileBertForPreTraining),
Expand All @@ -402,6 +415,7 @@
MODEL_WITH_LM_HEAD_MAPPING = OrderedDict(
[
# Model with LM heads mapping
(BigBirdConfig, BigBirdForMaskedLM),
(Speech2TextConfig, Speech2TextForConditionalGeneration),
(Wav2Vec2Config, Wav2Vec2ForMaskedLM),
(M2M100Config, M2M100ForConditionalGeneration),
Expand Down Expand Up @@ -444,6 +458,7 @@
MODEL_FOR_CAUSAL_LM_MAPPING = OrderedDict(
[
# Model for Causal LM mapping
(BigBirdConfig, BigBirdForCausalLM),
(CamembertConfig, CamembertForCausalLM),
(XLMRobertaConfig, XLMRobertaForCausalLM),
(RobertaConfig, RobertaForCausalLM),
Expand Down Expand Up @@ -473,6 +488,7 @@
MODEL_FOR_MASKED_LM_MAPPING = OrderedDict(
[
# Model for Masked LM mapping
(BigBirdConfig, BigBirdForMaskedLM),
(Wav2Vec2Config, Wav2Vec2ForMaskedLM),
(ConvBertConfig, ConvBertForMaskedLM),
(LayoutLMConfig, LayoutLMForMaskedLM),
Expand Down Expand Up @@ -523,6 +539,7 @@
MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING = OrderedDict(
[
# Model for Sequence Classification mapping
(BigBirdConfig, BigBirdForSequenceClassification),
(ConvBertConfig, ConvBertForSequenceClassification),
(LEDConfig, LEDForSequenceClassification),
(DistilBertConfig, DistilBertForSequenceClassification),
Expand Down Expand Up @@ -558,6 +575,7 @@
MODEL_FOR_QUESTION_ANSWERING_MAPPING = OrderedDict(
[
# Model for Question Answering mapping
(BigBirdConfig, BigBirdForQuestionAnswering),
(ConvBertConfig, ConvBertForQuestionAnswering),
(LEDConfig, LEDForQuestionAnswering),
(DistilBertConfig, DistilBertForQuestionAnswering),
Expand Down Expand Up @@ -595,6 +613,7 @@
MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING = OrderedDict(
[
# Model for Token Classification mapping
(BigBirdConfig, BigBirdForTokenClassification),
(ConvBertConfig, ConvBertForTokenClassification),
(LayoutLMConfig, LayoutLMForTokenClassification),
(DistilBertConfig, DistilBertForTokenClassification),
Expand Down Expand Up @@ -622,6 +641,7 @@
MODEL_FOR_MULTIPLE_CHOICE_MAPPING = OrderedDict(
[
# Model for Multiple Choice mapping
(BigBirdConfig, BigBirdForMultipleChoice),
(ConvBertConfig, ConvBertForMultipleChoice),
(CamembertConfig, CamembertForMultipleChoice),
(ElectraConfig, ElectraForMultipleChoice),
Expand Down
4 changes: 4 additions & 0 deletions src/transformers/models/auto/tokenization_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@
BartConfig,
BertConfig,
BertGenerationConfig,
BigBirdConfig,
BlenderbotConfig,
BlenderbotSmallConfig,
CamembertConfig,
Expand Down Expand Up @@ -111,6 +112,7 @@
from ..albert.tokenization_albert import AlbertTokenizer
from ..barthez.tokenization_barthez import BarthezTokenizer
from ..bert_generation.tokenization_bert_generation import BertGenerationTokenizer
from ..big_bird.tokenization_big_bird import BigBirdTokenizer
from ..camembert.tokenization_camembert import CamembertTokenizer
from ..deberta_v2.tokenization_deberta_v2 import DebertaV2Tokenizer
from ..m2m_100 import M2M100Tokenizer
Expand All @@ -129,6 +131,7 @@
AlbertTokenizer = None
BarthezTokenizer = None
BertGenerationTokenizer = None
BigBirdTokenizer = None
CamembertTokenizer = None
DebertaV2Tokenizer = None
MarianTokenizer = None
Expand Down Expand Up @@ -258,6 +261,7 @@
(TapasConfig, (TapasTokenizer, None)),
(LEDConfig, (LEDTokenizer, LEDTokenizerFast)),
(ConvBertConfig, (ConvBertTokenizer, ConvBertTokenizerFast)),
(BigBirdConfig, (BigBirdTokenizer, None)),
(IBertConfig, (RobertaTokenizer, RobertaTokenizerFast)),
(Wav2Vec2Config, (Wav2Vec2CTCTokenizer, None)),
]
Expand Down
Loading