fix attention mask collation #1603

winglian · 2024-05-08T20:32:45Z

not sure even how the previous code worked 🤷

here's part of the yaml that worked for me

pretraining_dataset:
  - path: HuggingFaceTB/cosmopedia_6M
    split: train
    type: completion
max_steps: 10_000

pretrain_multipack_attn: true
pretrain_multipack_buffer_size: 10_000

sequence_len: 4096
sample_packing: true
pad_to_sequence_len: true

timpal0l · 2024-05-08T23:16:14Z

@winglian I will try!

this should make the training faster if i understand correctly? since i understand that the attention mask was not set correctly with this set to false? (i guess this is only relevant when sample_packing: false also?)

This new error popped up now instead;

RuntimeErrorRuntimeError: : CUDA error: an illegal memory access was encountered

ali-mosavian · 2024-05-09T10:45:14Z

Not sure how this is done in Axolotl, but the idea is to use the attention mask to prohibit attention across samples. If correctly implemented.

winglian · 2024-05-10T00:14:32Z

Not sure how this is done in Axolotl, but the idea is to use the attention mask to prohibit attention across samples. If correctly implemented.

yes, this is the general philosophy of axolotl is to ensure the highest quality for SFT by preventing cross-attention. However, we enable this to be toggled on or off for (continued) pretraining b/c in those cases, you want to concatenate the samples to reach the full context length. Depending on the implementation, (FA2 vs SDPA vs eager) the model may never see position_ids up to the maximum context length. I don't have evidence one way or another that this is bad, but I would expect you want to maintain the existing context length during continued pretraining.

winglian mentioned this pull request May 8, 2024

Streaming large datasets not working with pretrain_multipack_attn: true #1597

Closed

8 tasks

winglian force-pushed the pretrain-collation-attn branch from 1362f8f to 9bc013b Compare May 14, 2024 01:29

winglian added the ready to merge label May 14, 2024

fix attention mask collation

0906362

winglian force-pushed the pretrain-collation-attn branch from 9bc013b to 0906362 Compare May 14, 2024 12:17

winglian merged commit 0298273 into main May 14, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix attention mask collation #1603

fix attention mask collation #1603

winglian commented May 8, 2024

timpal0l commented May 8, 2024 •

edited

Loading

ali-mosavian commented May 9, 2024

winglian commented May 10, 2024

fix attention mask collation #1603

fix attention mask collation #1603

Conversation

winglian commented May 8, 2024

timpal0l commented May 8, 2024 • edited Loading

ali-mosavian commented May 9, 2024

winglian commented May 10, 2024

timpal0l commented May 8, 2024 •

edited

Loading