Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

8-bit optimizer crashes when fine-tuning gpt2-large #26

Open
rationalism opened this issue Jun 30, 2022 · 0 comments
Open

8-bit optimizer crashes when fine-tuning gpt2-large #26

rationalism opened this issue Jun 30, 2022 · 0 comments

Comments

@rationalism
Copy link

rationalism commented Jun 30, 2022

Using the bnb.optim.Adam8bit optimizer in place of torch.optim.Adam causes a crash after a handful of batches:

12it [00:22, 1.82s/it]Error an illegal memory access was encountered at line 198 in file /home/alyssa/gpt_math/bitsandbytes/csrc/ops.cu

I am fine-tuning Huggingface's version of the gpt2-large model on an Ampere 3090 GPU with CUDA version 11.6 and nVidia driver version 510.73.05. I have tried compiling bitsandbytes on my machine from source, and the set_optim_to_run_embedding_in_fp32 trick from huggingface/transformers#14819; neither of them affected the behavior. Running with the standard pytorch Adam optimizer works fine. nvidia-smi shows 16 GB of memory used on a GPU with 24 GB, so it shouldn't be running out of RAM or anywhere close to that.

@rationalism rationalism changed the title 8-bit optimizer crashes when fine-tuning gpt-2 large 8-bit optimizer crashes when fine-tuning gpt2-large Jun 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant