forked from huggingface/transformers
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
bitsandbytes
- Linear8bitLt
integration into transformers
models (
huggingface#17901) * first commit * correct replace function * add final changes - works like charm! - cannot implement tests yet - tested * clean up a bit * add bitsandbytes dependencies * working version - added import function - added bitsandbytes utils file * small fix * small fix - fix import issue * fix import issues * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * refactor a bit - move bitsandbytes utils to utils - change comments on functions * reformat docstring - reformat docstring on init_empty_weights_8bit * Update src/transformers/__init__.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * revert bad formatting * change to bitsandbytes * refactor a bit - remove init8bit since it is useless * more refactoring - fixed init empty weights issue - added threshold param * small hack to make it work * Update src/transformers/modeling_utils.py * Update src/transformers/modeling_utils.py * revmoe the small hack * modify utils file * make style + refactor a bit * create correctly device map * add correct dtype for device map creation * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * apply suggestions - remove with torch.grad - do not rely on Python bool magic! * add docstring - add docstring for new kwargs * add docstring - comment `replace_8bit_linear` function - fix weird formatting * - added more documentation - added new utility function for memory footprint tracking - colab demo to add * few modifs - typo doc - force cast into float16 when load_in_8bit is enabled * added colab link * add test architecture + docstring a bit * refactor a bit testing class * make style + refactor a bit * enhance checks - add more checks - start writing saving test * clean up a bit * male style * add more details on doc * add more tests - still needs to fix 2 tests * replace by "or" - could not fix it from GitHub GUI Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * refactor a bit testing code + add readme * make style * fix import issue * Update src/transformers/modeling_utils.py Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com> * add few comments * add more doctring + make style * more docstring * raise error when loaded in 8bit * make style * add warning if loaded on CPU * add small sanity check * fix small comment * add bitsandbytes on dockerfile * Improve documentation - improve documentation from comments * add few comments * slow tests pass on the VM but not on the CI VM * Fix merge conflict * make style * another test should pass on a multi gpu setup * fix bad import in testing file * Fix slow tests - remove dummy batches - no more CUDA illegal memory errors * odify dockerfile * Update docs/source/en/main_classes/model.mdx * Update Dockerfile * Update model.mdx * Update Dockerfile * Apply suggestions from code review * few modifications - lm head can stay on disk/cpu - change model name so that test pass * change test value - change test value to the correct output - torch bmm changed to baddmm in bloom modeling when merging * modify installation guidelines * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * replace `n`by `name` * merge `load_in_8bit` and `low_cpu_mem_usage` * first try - keep the lm head in full precision * better check - check the attribute `base_model_prefix` instead of computing the number of parameters * added more tests * Update src/transformers/utils/bitsandbytes.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Merge branch 'integration-8bit' of https://github.com/younesbelkada/transformers into integration-8bit * improve documentation - fix typos for installation - change title in the documentation Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
- Loading branch information
1 parent
20e866e
commit 586d858
Showing
6 changed files
with
77 additions
and
176 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,120 +1,37 @@ | ||
# Testing mixed int8 quantization | ||
|
||
![HFxbitsandbytes.png](https://s3.amazonaws.com/moonup/production/uploads/1660567705337-62441d1d9fdefb55a0b7d12c.png) | ||
|
||
The following is the recipe on how to effectively debug `bitsandbytes` integration on Hugging Face `transformers`. | ||
|
||
## Library requirements | ||
|
||
+ `transformers>=4.22.0` | ||
+ `accelerate>=0.12.0` | ||
+ `bitsandbytes>=0.31.5`. | ||
## Hardware requirements | ||
|
||
The following instructions are tested with 2 NVIDIA-Tesla T4 GPUs. To run successfully `bitsandbytes` you would need a 8-bit core tensor supported GPU. Note that Turing, Ampere or newer architectures - e.g. T4, RTX20s RTX30s, A40-A100, A6000 should be supported. | ||
I am using a setup of 2 GPUs that are NVIDIA-Tesla T4 15GB | ||
|
||
## Virutal envs | ||
|
||
```bash | ||
conda create --name int8-testing python==3.8 | ||
pip install bitsandbytes>=0.31.5 | ||
pip install accelerate>=0.12.0 | ||
pip install transformers>=4.23.0 | ||
``` | ||
if `transformers>=4.23.0` is not released yet, then use: | ||
``` | ||
pip install git+https://github.com/huggingface/transformers.git | ||
``` | ||
|
||
## Troubleshooting | ||
|
||
A list of common errors: | ||
```conda create --name int8-testing python==3.8``` | ||
```git clone https://github.com/younesbelkada/transformers.git && git checkout integration-8bit``` | ||
```pip install -e ".[dev]"``` | ||
```pip install -i https://test.pypi.org/simple/ bitsandbytes``` | ||
```pip install git+https://github.com/huggingface/accelerate.git@e0212893ea6098cc0a7a3c7a6eb286a9104214c1``` | ||
|
||
### Torch does not correctly do the operations on GPU | ||
|
||
First check that: | ||
## Trobleshooting | ||
|
||
```py | ||
import torch | ||
|
||
vec = torch.randn(1, 2, 3).to(0) | ||
``` | ||
```conda create --name int8-testing python==3.8``` | ||
```pip install -i https://test.pypi.org/simple/ bitsandbytes``` | ||
```conda install pytorch torchvision torchaudio -c pytorch``` | ||
```git clone https://github.com/younesbelkada/transformers.git && git checkout integration-8bit``` | ||
```pip install -e ".[dev]"``` | ||
```pip install git+https://github.com/huggingface/accelerate.git@b52b793ea8bac108ba61192eead3cf11ca02433d``` | ||
|
||
Works without any error. If not, install torch using `conda` like: | ||
### Check driver settings: | ||
|
||
```bash | ||
conda create --name int8-testing python==3.8 | ||
conda install pytorch torchvision torchaudio cudatoolkit=11.6 -c pytorch -c conda-forge | ||
pip install bitsandbytes>=0.31.5 | ||
pip install accelerate>=0.12.0 | ||
pip install transformers>=4.23.0 | ||
``` | ||
For the latest pytorch instructions please see [this](https://pytorch.org/get-started/locally/) | ||
|
||
and the snippet above should work. | ||
|
||
### ` bitsandbytes operations are not supported under CPU!` | ||
|
||
This happens when some Linear weights are set to the CPU when using `accelerate`. Please check carefully `model.hf_device_map` and make sure that there is no `Linear` module that is assigned to CPU. It is fine to have the last module (usually the Lm_head) set on CPU. | ||
|
||
### `To use the type as a Parameter, please correct the detach() semantics defined by __torch_dispatch__() implementation.` | ||
|
||
Use the latest version of `accelerate` with a command such as: `pip install -U accelerate` and the problem should be solved. | ||
|
||
### `Parameter has no attribue .CB` | ||
|
||
Same solution as above. | ||
|
||
### `RuntimeError: CUDA error: an illegal memory access was encountered ... consider passing CUDA_LAUNCH_BLOCKING=1` | ||
|
||
Run your script by pre-pending `CUDA_LAUNCH_BLOCKING=1` and you should observe an error as described in the next section. | ||
|
||
### `CUDA illegal memory error: an illegal memory access at line...`: | ||
|
||
Check the CUDA verisons with: | ||
``` | ||
nvcc --version | ||
``` | ||
and confirm it is the same version as the one detected by `bitsandbytes`. If not, run: | ||
``` | ||
ls -l $CONDA_PREFIX/lib/libcudart.so | ||
``` | ||
or | ||
``` | ||
ls -l $LD_LIBRARY_PATH | ||
``` | ||
Check if `libcudart.so` has a correct symlink that is set. Sometimes `nvcc` detects the correct CUDA version but `bitsandbytes` doesn't. You have to make sure that the symlink that is set for the file `libcudart.so` is redirected to the correct CUDA file. | ||
|
||
Here is an example of a badly configured CUDA installation: | ||
|
||
`nvcc --version` gives: | ||
|
||
![Screenshot 2022-08-15 at 15.12.23.png](https://s3.amazonaws.com/moonup/production/uploads/1660569220888-62441d1d9fdefb55a0b7d12c.png) | ||
|
||
which means that the detected CUDA version is 11.3 but `bitsandbytes` outputs: | ||
|
||
![image.png](https://s3.amazonaws.com/moonup/production/uploads/1660569284243-62441d1d9fdefb55a0b7d12c.png) | ||
|
||
First check: | ||
|
||
```bash | ||
echo $LD_LIBRARY_PATH | ||
``` | ||
|
||
If this contains multiple paths separated by `:`. Then you have to make sure that the correct CUDA version is set. By doing: | ||
|
||
```bash | ||
ls -l $path/libcudart.so | ||
``` | ||
|
||
On each path (`$path`) separated by `:`. | ||
If not, simply run | ||
```bash | ||
ls -l $LD_LIBRARY_PATH/libcudart.so | ||
ls -l $CONDA_PREFIX/lib/libcudart.so | ||
``` | ||
|
||
and you can see | ||
|
||
![Screenshot 2022-08-15 at 15.12.33.png](https://s3.amazonaws.com/moonup/production/uploads/1660569176504-62441d1d9fdefb55a0b7d12c.png) | ||
### Recurrent bugs | ||
|
||
If you see that the file is linked to the wrong CUDA version (here 10.2), find the correct location for `libcudart.so` (`find --name libcudart.so`) and replace the environment variable `LD_LIBRARY_PATH` with the one containing the correct `libcudart.so` file. | ||
Sometimes you have to run a "dummy" inference pass when dealing with a multi-GPU setup. Checkout the ```test_multi_gpu_loading``` and the ```test_pipeline``` functions. |
Oops, something went wrong.