Fix usage of head masks by PT encoder-decoder models' `generate()` function #11621

stancld · 2021-05-06T21:48:02Z

This PR adds missing arguments head_mask, decoder_head_mask and cross_attn_head_mask into prepare_inputs_for_generation function of PyTorch encoder-decoder models so that these args will be used during the generation when generate() function is called.

EDIT: Need to fix the new test for ProphetNet

Example

out = bart.generate(input_ids, ...)
tokenizer.decode(out[0], ...)

>>> 'The Eiffel Tower in Paris has been officially opened to the public.'

Behaviour before the PR:

out = bart.generate(input_ids, decoder_head_mask=decoder_head_mask, ...)
tokenizer.decode(out[0], ...)

- >>> 'The Eiffel Tower in Paris has been officially opened to the public.'

Behaviour after the PR:

out = bart.generate(input_ids, decoder_head_mask=decoder_head_mask, ...)
tokenizer.decode(out[0], ...)

+ >>> 'The Eiffel Tower in Paris has been officially opened to the public for the first time since it was completed in 1903.'

Reviewers: @patrickvonplaten

* Add head_mask, decoder_head_mask and cross_attn_head_mask into prepare_inputs_for_generation for generate() function for multiple encoder-decoder models.

patrickvonplaten · 2021-05-07T08:49:35Z

Hey @stancld,

Thanks a lot for this contribution! Could we add one test to verify that generation works with head_mask for all encoder-decoder models?

I think it could be added to test_generation_utils.py

stancld · 2021-05-07T14:04:07Z

Hey @patrickvonplaten, I've added one test. At this moment, there are two little issues I'm gonna handle later today so that all encoder-decoder models will pass this new test.

patrickvonplaten

BTW, if Prophetnet doesn't work with head_masking + generate, I'm totally fine with leaving it out for ProphetNet - we could then just overwrite the test to not run for Prophenet. The model is very unique and also doesn't work fully at the moment in general

stancld · 2021-05-13T14:24:44Z

Hi @patrickvonplaten, sorry for being silent for a while as I've been a bit too busy. As you suggest, I skip the test for ProphetNetForConditionalGeneration model and now all the tests pass :)

…nction (huggingface#11621) * Add missing head masking for generate() function * Add head_mask, decoder_head_mask and cross_attn_head_mask into prepare_inputs_for_generation for generate() function for multiple encoder-decoder models. * Add test_genereate_with_head_masking * [WIP] Update the new test and handle special cases * make style * Omit ProphetNet test so far * make fix-copies

Add missing head masking for generate() function

345e957

* Add head_mask, decoder_head_mask and cross_attn_head_mask into prepare_inputs_for_generation for generate() function for multiple encoder-decoder models.

Add test_genereate_with_head_masking

821f515

[WIP] Update the new test and handle special cases

d2c1333

stancld changed the title ~~Fix usage of head masks by PT encoder-decoder models' generate() function~~ [WIP] Fix usage of head masks by PT encoder-decoder models' generate() function May 7, 2021

make style

a0a39de

patrickvonplaten reviewed May 13, 2021

View reviewed changes

stancld added 3 commits May 13, 2021 15:52

Merge 'upstream/master' into head_masking_for_generation

f99ed13

Omit ProphetNet test so far

31d4420

make fix-copies

6e673b3

stancld changed the title ~~[WIP] Fix usage of head masks by PT encoder-decoder models' generate() function~~ Fix usage of head masks by PT encoder-decoder models' generate() function May 16, 2021

patrickvonplaten approved these changes May 18, 2021

View reviewed changes

patrickvonplaten merged commit 680d181 into huggingface:master May 18, 2021

patrickvonplaten mentioned this pull request May 19, 2021

[T5 failing CI] Fix generate test #11770

Merged

5 tasks

stancld mentioned this pull request May 19, 2021

Fix usage of head masks by TF encoder-decoder models' generate() function #11775

Merged

sam-writer mentioned this pull request Nov 12, 2021

forward() got an unexpected keyword argument 'cross_attn_head_mask' Ki6an/fastT5#18

Closed

sam-writer mentioned this pull request Dec 1, 2021

Huggingface demos use an older version (4.6.1) of transformers, which has a bug in the implementation of t5 NVIDIA/TensorRT#1655

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix usage of head masks by PT encoder-decoder models' `generate()` function #11621

Fix usage of head masks by PT encoder-decoder models' `generate()` function #11621

stancld commented May 6, 2021 •

edited

Loading

patrickvonplaten commented May 7, 2021

stancld commented May 7, 2021

patrickvonplaten left a comment

stancld commented May 13, 2021

Fix usage of head masks by PT encoder-decoder models' generate() function #11621

Fix usage of head masks by PT encoder-decoder models' generate() function #11621

Conversation

stancld commented May 6, 2021 • edited Loading

Example

patrickvonplaten commented May 7, 2021

stancld commented May 7, 2021

patrickvonplaten left a comment

Choose a reason for hiding this comment

stancld commented May 13, 2021

Fix usage of head masks by PT encoder-decoder models' `generate()` function #11621

Fix usage of head masks by PT encoder-decoder models' `generate()` function #11621

stancld commented May 6, 2021 •

edited

Loading