Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigBird #10183

Merged
merged 88 commits into from
Mar 30, 2021
Merged

BigBird #10183

merged 88 commits into from
Mar 30, 2021

Conversation

thevasudevgupta
Copy link
Contributor

@thevasudevgupta thevasudevgupta commented Feb 15, 2021

What does this PR do?

This PR will add Google's BigBird "Roberta".

Fixes #6113.

This PR adds three checkpoints of BigBird:

Here a notebook showing how well BigBird works on long-document question answering: https://colab.research.google.com/drive/1DVOm1VHjW0eKCayFq1N2GpY6GR9M4tJP?usp=sharing

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline, Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed.
@patrickvonplaten

@isollid
Copy link

isollid commented Feb 24, 2021

Will BigBird-Pegasus be added, and then BigBirdForConditionalGeneration so that summarization will be possible?

@thevasudevgupta
Copy link
Contributor Author

Yes, we will be adding that soon.

Will BigBird-Pegasus be added, and then BigBirdForConditionalGeneration so that summarization will be possible?

@thevasudevgupta
Copy link
Contributor Author

thevasudevgupta commented Feb 25, 2021

Once pre-trained checkpoints are uploaded to huggingface_hub, model & tokenizer can be accessed this way:

from transformers import BigBirdForMaskedLM, BigBirdForPreTraining, BigBirdTokenizer

tokenizer = BigBirdTokenizer.from_pretrained("google/bigbird-roberta-base")

# model with LM head
model_with_lm = BigBirdForMaskedLM.from_pretrained("google/bigbird-roberta-base")

# model with pertaining heads
model_for_pretraining = BigBirdForPreTraining.from_pretrained("google/bigbird-roberta-base")

@patrickvonplaten patrickvonplaten changed the title Add BigBird BigBird Mar 29, 2021
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing add! This is a big model and will make for a nice addition. I have left quite a few comments for styling mainly.

On top of that, don't forget to add your model to the main README!

src/transformers/models/auto/modeling_auto.py Outdated Show resolved Hide resolved
src/transformers/models/big_bird/configuration_big_bird.py Outdated Show resolved Hide resolved
src/transformers/models/big_bird/configuration_big_bird.py Outdated Show resolved Hide resolved
src/transformers/models/big_bird/configuration_big_bird.py Outdated Show resolved Hide resolved
src/transformers/models/big_bird/configuration_big_bird.py Outdated Show resolved Hide resolved
tests/test_modeling_big_bird.py Outdated Show resolved Hide resolved
tests/test_modeling_big_bird.py Show resolved Hide resolved
tests/test_modeling_big_bird.py Show resolved Hide resolved
tests/test_tokenization_big_bird.py Outdated Show resolved Hide resolved
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made typos in my suggestions, sorry!

src/transformers/models/big_bird/modeling_big_bird.py Outdated Show resolved Hide resolved
src/transformers/models/big_bird/modeling_big_bird.py Outdated Show resolved Hide resolved
sgugger and others added 4 commits March 29, 2021 14:38
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great @vasudevgupta7! I've left a few comments, mostly nits.

This made me think we should really push for fast tokenizers in the templates, as they're arguably more important and useful than their python counterparts.

Thanks a lot for working on this @vasudevgupta7, this is a tremendous effort!

docs/source/model_doc/bigbird.rst Outdated Show resolved Hide resolved
src/transformers/models/big_bird/configuration_big_bird.py Outdated Show resolved Hide resolved
src/transformers/models/big_bird/configuration_big_bird.py Outdated Show resolved Hide resolved
src/transformers/models/big_bird/configuration_big_bird.py Outdated Show resolved Hide resolved
src/transformers/models/big_bird/modeling_big_bird.py Outdated Show resolved Hide resolved
src/transformers/models/big_bird/tokenization_big_bird.py Outdated Show resolved Hide resolved
tests/test_modeling_big_bird.py Outdated Show resolved Hide resolved
thevasudevgupta and others added 3 commits March 30, 2021 01:31
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
@thevasudevgupta
Copy link
Contributor Author

@sgugger, @LysandreJik I updated the code based on your suggestions. Please let me know if I have missed something.

@patrickvonplaten patrickvonplaten merged commit 6dfd027 into huggingface:master Mar 30, 2021
@LysandreJik
Copy link
Member

Thank you for taking care of the comments @vasudevgupta7 and for this PR altogether!

@thevasudevgupta thevasudevgupta mentioned this pull request Mar 31, 2021
8 tasks
@sayakmisra
Copy link

@vasudevgupta7 great work, when are you planning to add the BigBirdForConditionalGeneration? And any plans on adding the pubmed pre-trained models?

@thevasudevgupta
Copy link
Contributor Author

@sayakmisra I am currently working on it. You can track PR #10991.

@jigsaw2212
Copy link

@vasudevgupta7 currently loading vasudevgupta/bigbird-pegasus-large-bigpatent into BigBirdForConditionalGeneration leads to some weights of the checkpoint not being used for initializing the model. Is there a workaround for this?

Can we have separate pretrained checkpoints for BigBird and Pegasus without the finetuning, so that we can use the Pegasus decoder along with the BigBird encoder in our code?

@patrickvonplaten
Copy link
Contributor

Hey @jigsaw2212,

we are still working on integrating BigBirdPegasus -> for now only the google/bigbird-... are fully supported. BigBirdPegasus will be merged in 1,2 weeks

Iwontbecreative pushed a commit to Iwontbecreative/transformers that referenced this pull request Jul 15, 2021
* init bigbird

* model.__init__ working, conversion script ready, config updated

* add conversion script

* BigBirdEmbeddings working :)

* slightly update conversion script

* BigBirdAttention working :) ; some bug in layer.output.dense

* add debugger-notebook

* forward() working for BigBirdModel :) ; replaced gelu with gelu_fast

* tf code adapted to torch till rand_attn in bigbird_block_sparse_attention ; till now everything working :)

* BigBirdModel working in block-sparse attention mode :)

* add BigBirdForPreTraining

* small fix

* add tokenizer for BigBirdModel

* fix config & hence modeling

* fix base prefix

* init testing

* init tokenizer test

* pos_embed must be absolute, attn_type=original_full when add_cross_attn=True , nsp loss is optional in BigBirdForPreTraining, add assert statements

* remove position_embedding_type arg

* complete normal tests

* add comments to block sparse attention

* add attn_probs for sliding & global tokens

* create fn for block sparse attn mask creation

* add special tests

* restore pos embed arg

* minor fix

* attn probs update

* make big bird fully gpu friendly

* fix tests

* remove pruning

* correct tokenzier & minor fixes

* update conversion script , remove norm_type

* tokenizer-inference test add

* remove extra comments

* add docs

* save intermediate

* finish trivia_qa conversion

* small update to forward

* correct qa and layer

* better error message

* BigBird QA ready

* fix rebased

* add triva-qa debugger notebook

* qa setup

* fixed till embeddings

* some issue in q/k/v_layer

* fix bug in conversion-script

* fixed till self-attn

* qa fixed except layer norm

* add qa end2end test

* fix gradient ckpting ; other qa test

* speed-up big bird a bit

* hub_id=google

* clean up

* make quality

* speed up einsum with bmm

* finish perf improvements for big bird

* remove wav2vec2 tok

* fix tokenizer

* include docs

* correct docs

* add helper to auto pad block size

* make style

* remove fast tokenizer for now

* fix some

* add pad test

* finish

* fix some bugs

* fix another bug

* fix buffer tokens

* fix comment and merge from master

* add comments

* make style

* commit some suggestions

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix typos

* fix some more suggestions

* add another patch

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix copies

* another path

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* update

* update nit suggestions

* make style

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

🌟 BigBird
7 participants