BigBird #10183

thevasudevgupta · 2021-02-15T09:22:26Z

What does this PR do?

This PR will add Google's BigBird "Roberta".

This PR adds three checkpoints of BigBird:

Here a notebook showing how well BigBird works on long-document question answering: https://colab.research.google.com/drive/1DVOm1VHjW0eKCayFq1N2GpY6GR9M4tJP?usp=sharing

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed.
@patrickvonplaten

src/transformers/models/big_bird/convert_bigbird_original_tf_checkpoint_to_pytorch.py

src/transformers/models/big_bird/modeling_big_bird.py

…tion ; till now everything working :)

isollid · 2021-02-24T13:47:21Z

Will BigBird-Pegasus be added, and then BigBirdForConditionalGeneration so that summarization will be possible?

thevasudevgupta · 2021-02-24T20:07:08Z

Yes, we will be adding that soon.

Will BigBird-Pegasus be added, and then BigBirdForConditionalGeneration so that summarization will be possible?

thevasudevgupta · 2021-02-25T00:21:09Z

Once pre-trained checkpoints are uploaded to huggingface_hub, model & tokenizer can be accessed this way:

from transformers import BigBirdForMaskedLM, BigBirdForPreTraining, BigBirdTokenizer

tokenizer = BigBirdTokenizer.from_pretrained("google/bigbird-roberta-base")

# model with LM head
model_with_lm = BigBirdForMaskedLM.from_pretrained("google/bigbird-roberta-base")

# model with pertaining heads
model_for_pretraining = BigBirdForPreTraining.from_pretrained("google/bigbird-roberta-base")

…tn=True , nsp loss is optional in BigBirdForPreTraining, add assert statements

…formers into add_big_bird

…into add_big_bird

sgugger

Amazing add! This is a big model and will make for a nice addition. I have left quite a few comments for styling mainly.

On top of that, don't forget to add your model to the main README!

src/transformers/models/auto/modeling_auto.py

src/transformers/models/big_bird/configuration_big_bird.py

src/transformers/models/big_bird/modeling_big_bird.py

tests/test_modeling_big_bird.py

tests/test_tokenization_big_bird.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

sgugger

Made typos in my suggestions, sorry!

src/transformers/models/big_bird/modeling_big_bird.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

LysandreJik

This is great @vasudevgupta7! I've left a few comments, mostly nits.

This made me think we should really push for fast tokenizers in the templates, as they're arguably more important and useful than their python counterparts.

Thanks a lot for working on this @vasudevgupta7, this is a tremendous effort!

src/transformers/models/big_bird/tokenization_big_bird.py

docs/source/model_doc/bigbird.rst

src/transformers/models/big_bird/configuration_big_bird.py

src/transformers/models/big_bird/modeling_big_bird.py

src/transformers/models/big_bird/tokenization_big_bird.py

tests/test_modeling_big_bird.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

thevasudevgupta · 2021-03-29T22:05:21Z

@sgugger, @LysandreJik I updated the code based on your suggestions. Please let me know if I have missed something.

LysandreJik · 2021-03-30T12:42:36Z

Thank you for taking care of the comments @vasudevgupta7 and for this PR altogether!

sayakmisra · 2021-04-07T14:04:10Z

@vasudevgupta7 great work, when are you planning to add the BigBirdForConditionalGeneration? And any plans on adding the pubmed pre-trained models?

thevasudevgupta · 2021-04-07T15:32:38Z

@sayakmisra I am currently working on it. You can track PR #10991.

jigsaw2212 · 2021-04-28T22:18:47Z

@vasudevgupta7 currently loading vasudevgupta/bigbird-pegasus-large-bigpatent into BigBirdForConditionalGeneration leads to some weights of the checkpoint not being used for initializing the model. Is there a workaround for this?

Can we have separate pretrained checkpoints for BigBird and Pegasus without the finetuning, so that we can use the Pegasus decoder along with the BigBird encoder in our code?

patrickvonplaten · 2021-04-29T12:56:45Z

Hey @jigsaw2212,

we are still working on integrating BigBirdPegasus -> for now only the google/bigbird-... are fully supported. BigBirdPegasus will be merged in 1,2 weeks

* init bigbird * model.__init__ working, conversion script ready, config updated * add conversion script * BigBirdEmbeddings working :) * slightly update conversion script * BigBirdAttention working :) ; some bug in layer.output.dense * add debugger-notebook * forward() working for BigBirdModel :) ; replaced gelu with gelu_fast * tf code adapted to torch till rand_attn in bigbird_block_sparse_attention ; till now everything working :) * BigBirdModel working in block-sparse attention mode :) * add BigBirdForPreTraining * small fix * add tokenizer for BigBirdModel * fix config & hence modeling * fix base prefix * init testing * init tokenizer test * pos_embed must be absolute, attn_type=original_full when add_cross_attn=True , nsp loss is optional in BigBirdForPreTraining, add assert statements * remove position_embedding_type arg * complete normal tests * add comments to block sparse attention * add attn_probs for sliding & global tokens * create fn for block sparse attn mask creation * add special tests * restore pos embed arg * minor fix * attn probs update * make big bird fully gpu friendly * fix tests * remove pruning * correct tokenzier & minor fixes * update conversion script , remove norm_type * tokenizer-inference test add * remove extra comments * add docs * save intermediate * finish trivia_qa conversion * small update to forward * correct qa and layer * better error message * BigBird QA ready * fix rebased * add triva-qa debugger notebook * qa setup * fixed till embeddings * some issue in q/k/v_layer * fix bug in conversion-script * fixed till self-attn * qa fixed except layer norm * add qa end2end test * fix gradient ckpting ; other qa test * speed-up big bird a bit * hub_id=google * clean up * make quality * speed up einsum with bmm * finish perf improvements for big bird * remove wav2vec2 tok * fix tokenizer * include docs * correct docs * add helper to auto pad block size * make style * remove fast tokenizer for now * fix some * add pad test * finish * fix some bugs * fix another bug * fix buffer tokens * fix comment and merge from master * add comments * make style * commit some suggestions Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Fix typos * fix some more suggestions * add another patch Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fix copies * another path Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * update * update nit suggestions * make style Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

thevasudevgupta and others added 5 commits February 15, 2021 01:59

init bigbird

d0aa9ea

model.__init__ working, conversion script ready, config updated

0e37183

add conversion script

faa4ab3

BigBirdEmbeddings working :)

facd93e

slightly update conversion script

28c8d13

patrickvonplaten reviewed Feb 16, 2021

View reviewed changes

src/transformers/models/big_bird/convert_bigbird_original_tf_checkpoint_to_pytorch.py Outdated Show resolved Hide resolved

BigBirdAttention working :) ; some bug in layer.output.dense

124b99f

thevasudevgupta commented Feb 17, 2021

View reviewed changes

src/transformers/models/big_bird/modeling_big_bird.py Outdated Show resolved Hide resolved

thevasudevgupta added 6 commits February 17, 2021 16:57

add debugger-notebook

2c14788

forward() working for BigBirdModel :) ; replaced gelu with gelu_fast

12a523b

tf code adapted to torch till rand_attn in bigbird_block_sparse_atten…

aebd36b

…tion ; till now everything working :)

BigBirdModel working in block-sparse attention mode :)

9df3127

add BigBirdForPreTraining

ad84acf

small fix

4076c9b

thevasudevgupta added 2 commits February 25, 2021 05:30

add tokenizer for BigBirdModel

78a205a

fix config & hence modeling

644f65d

thevasudevgupta added 11 commits February 25, 2021 20:38

fix base prefix

ce66bac

init testing

f672205

init tokenizer test

372ff99

pos_embed must be absolute, attn_type=original_full when add_cross_at…

ed6dc49

…tn=True , nsp loss is optional in BigBirdForPreTraining, add assert statements

remove position_embedding_type arg

7e05539

complete normal tests

d257079

add comments to block sparse attention

07ec9a1

add attn_probs for sliding & global tokens

01dd2e8

create fn for block sparse attn mask creation

49d62e5

add special tests

5912716

restore pos embed arg

89de3c5

patrickvonplaten and others added 4 commits March 29, 2021 06:42

Merge branch 'add_big_bird' of https://github.com/vasudevgupta7/trans…

1af7c98

…formers into add_big_bird

fix comment and merge from master

aca2b4b

add comments

ef673bb

make style

a6018bf

patrickvonplaten approved these changes Mar 29, 2021

View reviewed changes

patrickvonplaten changed the title ~~Add BigBird~~ BigBird Mar 29, 2021

Merge branch 'master' of https://github.com/huggingface/transformers …

58ef450

…into add_big_bird

sgugger approved these changes Mar 29, 2021

View reviewed changes

commit some suggestions

8a47841

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

sgugger reviewed Mar 29, 2021

View reviewed changes

src/transformers/models/big_bird/modeling_big_bird.py Outdated Show resolved Hide resolved

src/transformers/models/big_bird/modeling_big_bird.py Outdated Show resolved Hide resolved

sgugger and others added 4 commits March 29, 2021 14:38

Fix typos

dbc6e39

fix some more suggestions

25164b9

add another patch

7bbbd6b

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

fix copies

ab6755e

LysandreJik approved these changes Mar 29, 2021

View reviewed changes

thevasudevgupta and others added 3 commits March 30, 2021 01:31

another path

a9779b2

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

update

df70258

update nit suggestions

0f110c5

make style

8332604

patrickvonplaten merged commit 6dfd027 into huggingface:master Mar 30, 2021

norabelrose mentioned this pull request Mar 30, 2021

Add FAVOR+ / Performer attention #9325

Closed

thevasudevgupta mentioned this pull request Mar 31, 2021

Add BigBirdPegasus #10991

Merged

8 tasks

mmmeee1111 mentioned this pull request Apr 18, 2021

failed to build the project. Tencent/TurboTransformers#231

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BigBird #10183

BigBird #10183

thevasudevgupta commented Feb 15, 2021 •

edited

Loading

isollid commented Feb 24, 2021

thevasudevgupta commented Feb 24, 2021

thevasudevgupta commented Feb 25, 2021 •

edited

Loading

sgugger left a comment

sgugger left a comment

LysandreJik left a comment

thevasudevgupta commented Mar 29, 2021

LysandreJik commented Mar 30, 2021

sayakmisra commented Apr 7, 2021

thevasudevgupta commented Apr 7, 2021

jigsaw2212 commented Apr 28, 2021

patrickvonplaten commented Apr 29, 2021

BigBird #10183

BigBird #10183

Conversation

thevasudevgupta commented Feb 15, 2021 • edited Loading

What does this PR do?

Before submitting

Who can review?

isollid commented Feb 24, 2021

thevasudevgupta commented Feb 24, 2021

thevasudevgupta commented Feb 25, 2021 • edited Loading

sgugger left a comment

Choose a reason for hiding this comment

sgugger left a comment

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

thevasudevgupta commented Mar 29, 2021

LysandreJik commented Mar 30, 2021

sayakmisra commented Apr 7, 2021

thevasudevgupta commented Apr 7, 2021

jigsaw2212 commented Apr 28, 2021

patrickvonplaten commented Apr 29, 2021

thevasudevgupta commented Feb 15, 2021 •

edited

Loading

thevasudevgupta commented Feb 25, 2021 •

edited

Loading