Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix usage of head masks by PT encoder-decoder models' generate() function #11621

Merged

Conversation

stancld
Copy link
Contributor

@stancld stancld commented May 6, 2021

This PR adds missing arguments head_mask, decoder_head_mask and cross_attn_head_mask into prepare_inputs_for_generation function of PyTorch encoder-decoder models so that these args will be used during the generation when generate() function is called.

EDIT: Need to fix the new test for ProphetNet


Example

out = bart.generate(input_ids, ...)
tokenizer.decode(out[0], ...)
>>> 'The Eiffel Tower in Paris has been officially opened to the public.'

Behaviour before the PR:

out = bart.generate(input_ids, decoder_head_mask=decoder_head_mask, ...)
tokenizer.decode(out[0], ...)
- >>> 'The Eiffel Tower in Paris has been officially opened to the public.'

Behaviour after the PR:

out = bart.generate(input_ids, decoder_head_mask=decoder_head_mask, ...)
tokenizer.decode(out[0], ...)
+ >>> 'The Eiffel Tower in Paris has been officially opened to the public for the first time since it was completed in 1903.'

Reviewers: @patrickvonplaten

* Add head_mask, decoder_head_mask and cross_attn_head_mask
into prepare_inputs_for_generation for generate() function
for multiple encoder-decoder models.
@patrickvonplaten
Copy link
Contributor

Hey @stancld,

Thanks a lot for this contribution! Could we add one test to verify that generation works with head_mask for all encoder-decoder models?

I think it could be added to test_generation_utils.py

@stancld
Copy link
Contributor Author

stancld commented May 7, 2021

Hey @patrickvonplaten, I've added one test. At this moment, there are two little issues I'm gonna handle later today so that all encoder-decoder models will pass this new test.

@stancld stancld changed the title Fix usage of head masks by PT encoder-decoder models' generate() function [WIP] Fix usage of head masks by PT encoder-decoder models' generate() function May 7, 2021
Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, if Prophetnet doesn't work with head_masking + generate, I'm totally fine with leaving it out for ProphetNet - we could then just overwrite the test to not run for Prophenet. The model is very unique and also doesn't work fully at the moment in general

@stancld
Copy link
Contributor Author

stancld commented May 13, 2021

Hi @patrickvonplaten, sorry for being silent for a while as I've been a bit too busy. As you suggest, I skip the test for ProphetNetForConditionalGeneration model and now all the tests pass :)

@stancld stancld changed the title [WIP] Fix usage of head masks by PT encoder-decoder models' generate() function Fix usage of head masks by PT encoder-decoder models' generate() function May 16, 2021
@patrickvonplaten patrickvonplaten merged commit 680d181 into huggingface:master May 18, 2021
Iwontbecreative pushed a commit to Iwontbecreative/transformers that referenced this pull request Jul 15, 2021
…nction (huggingface#11621)

* Add missing head masking for generate() function

* Add head_mask, decoder_head_mask and cross_attn_head_mask
into prepare_inputs_for_generation for generate() function
for multiple encoder-decoder models.

* Add test_genereate_with_head_masking

* [WIP] Update the new test and handle special cases

* make style

* Omit ProphetNet test so far

* make fix-copies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants