[`Gemma2`] Support FA2 softcapping #31887

ArthurZucker · 2024-07-10T10:22:37Z

What does this PR do?

Adds support for the new FA2 softcapping following Dao-AILab/flash-attention#1025

HuggingFaceDocBuilderDev · 2024-07-10T12:53:41Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

LysandreJik

OK, looks good to me! 2.6.0 was released 3 hours ago, let's go

amyeroberts

LGTM - thanks for adding!

* Support softcapping * strictly greater than * update

ShadowTeamCN · 2024-07-12T07:16:46Z

Good to see this. Can we use it for model fine-tuning, or is it just for inference? Google recommends fine-tuning in 'eager' mode.

ArthurZucker · 2024-07-12T10:05:04Z

Now you can use it for finetuning as well if you have the correct version of FA2. Not sure if finetuning "requires" it

heartkilla · 2024-07-12T10:08:52Z

Great! Any plans for sdpa support as well?

ArthurZucker · 2024-07-12T10:12:15Z

Sdpa is a bit more complicated, we need to use flex attention, did not have time to implement. Do you want to open a PR?

hiyouga · 2024-07-13T15:10:35Z

Hi @ArthurZucker, should we also add the sliding window and soft-capping to flash_attn_func

transformers/src/transformers/models/gemma2/modeling_gemma2.py

Lines 466 to 469 in fc35907

    
           else: 
        
               attn_output = flash_attn_func( 
        
                   query_states, key_states, value_states, dropout, softmax_scale=softmax_scale, causal=causal 
        
               )

just like

transformers/src/transformers/models/mistral/modeling_mistral.py

Lines 519 to 528 in fc35907

    
           else: 
        
               attn_output = flash_attn_func( 
        
                   query_states, 
        
                   key_states, 
        
                   value_states, 
        
                   dropout, 
        
                   softmax_scale=softmax_scale, 
        
                   causal=causal, 
        
                   window_size=(self.config.sliding_window, self.config.sliding_window), 
        
               )

ArthurZucker · 2024-07-15T12:39:18Z

It should be here on main: https://github.com/huggingface/transformers/blob/main/src/transformers/models/gemma2/modeling_gemma2.py#L361 we updated the whole FA2 integration .

On the release branch it was there AFAIK

hiyouga · 2024-07-15T13:55:18Z

Get it, thanks for replying!

* Support softcapping * strictly greater than * update

ArthurZucker added 3 commits July 10, 2024 12:20

Support softcapping

88e1f18

strictly greater than

1e92cae

update

cd8d08d

ArthurZucker requested review from amyeroberts and LysandreJik July 11, 2024 08:04

LysandreJik approved these changes Jul 11, 2024

View reviewed changes

amyeroberts approved these changes Jul 11, 2024

View reviewed changes

ArthurZucker merged commit f4ec7a2 into main Jul 11, 2024
24 checks passed

ArthurZucker deleted the gemma-fa2 branch July 11, 2024 09:57

SeongBeomLEE mentioned this pull request Jul 11, 2024

[fix] AttributeError in is_flash_attn_greater_or_equal #31908

Closed

ArthurZucker added a commit that referenced this pull request Jul 11, 2024

[Gemma2] Support FA2 softcapping (#31887)

1b8b410

* Support softcapping * strictly greater than * update

ArthurZucker added a commit that referenced this pull request Jul 11, 2024

[Gemma2] Support FA2 softcapping (#31887)

e002fcd

* Support softcapping * strictly greater than * update

amyeroberts pushed a commit to amyeroberts/transformers that referenced this pull request Jul 19, 2024

[Gemma2] Support FA2 softcapping (huggingface#31887)

5a5b4c1

* Support softcapping * strictly greater than * update

MHRDYN7 pushed a commit to MHRDYN7/transformers that referenced this pull request Jul 23, 2024

[Gemma2] Support FA2 softcapping (huggingface#31887)

38bf4fc

* Support softcapping * strictly greater than * update

zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request Jul 24, 2024

[Gemma2] Support FA2 softcapping (huggingface#31887)

19cac29

* Support softcapping * strictly greater than * update

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`Gemma2`] Support FA2 softcapping #31887

[`Gemma2`] Support FA2 softcapping #31887

ArthurZucker commented Jul 10, 2024

HuggingFaceDocBuilderDev commented Jul 10, 2024

LysandreJik left a comment

amyeroberts left a comment

ShadowTeamCN commented Jul 12, 2024

ArthurZucker commented Jul 12, 2024

heartkilla commented Jul 12, 2024

ArthurZucker commented Jul 12, 2024

hiyouga commented Jul 13, 2024 •

edited

Loading

ArthurZucker commented Jul 15, 2024 •

edited

Loading

hiyouga commented Jul 15, 2024

[Gemma2] Support FA2 softcapping #31887

[Gemma2] Support FA2 softcapping #31887

Conversation

ArthurZucker commented Jul 10, 2024

What does this PR do?

HuggingFaceDocBuilderDev commented Jul 10, 2024

LysandreJik left a comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

ShadowTeamCN commented Jul 12, 2024

ArthurZucker commented Jul 12, 2024

heartkilla commented Jul 12, 2024

ArthurZucker commented Jul 12, 2024

hiyouga commented Jul 13, 2024 • edited Loading

ArthurZucker commented Jul 15, 2024 • edited Loading

hiyouga commented Jul 15, 2024

[`Gemma2`] Support FA2 softcapping #31887

[`Gemma2`] Support FA2 softcapping #31887

hiyouga commented Jul 13, 2024 •

edited

Loading

ArthurZucker commented Jul 15, 2024 •

edited

Loading