Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Gemma Support #393

Merged
merged 5 commits into from
Mar 11, 2024
Merged

Conversation

TechxGenus
Copy link
Contributor

Add latest google gemma model.

@casper-hansen
Copy link
Owner

casper-hansen commented Mar 10, 2024

Hi @TechxGenus, great to see Gemma support. I tested your code and the quantization seems to work, although I have some issues measuring perplexity on the Gemma model series in general.

I am getting some odd sizes for the model once saved (6GB shard + 600MB shard):

-rw-rw-rw-  1 root root 6558499704 Mar 10 16:18 model-00001-of-00002.safetensors
-rw-rw-rw-  1 root root  614576896 Mar 10 16:18 model-00002-of-00002.safetensors

However, I tested the fused modules and it seems that I get the following error:

Traceback (most recent call last):
  File "/workspace/AutoAWQ/examples/generate.py", line 29, in <module>
    generation_output = model.generate(
  File "/workspace/AutoAWQ/awq/models/base.py", line 111, in generate
    return self.model.generate(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1544, in generate
    return self.greedy_search(
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2404, in greedy_search
    outputs = self(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/gemma/modeling_gemma.py", line 1073, in forward
    outputs = self.model(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/workspace/AutoAWQ/awq/modules/fused/model.py", line 119, in forward
    h, _, past_key_value = layer(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/AutoAWQ/awq/modules/fused/block.py", line 113, in forward
    attn_output, _, past_key_value = self.attn.forward(
  File "/workspace/AutoAWQ/awq/modules/fused/attn.py", line 198, in forward
    xqkv = xqkv.view((bsz, seqlen) + self.attention_shapes["xqkv_view"])
RuntimeError: shape '[1, 47, 48, 192]' is invalid for input of size 577536

@TechxGenus
Copy link
Contributor Author

Yes, the quantized model file size is odd. This may be related to Google's design as Gemma has a very large embedding layer.
I cannot seem to reproduce this error. I used the quantized gemma-2b-it model to run examples/generate.py and got the following results:

Replacing layers...: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 18/18 [00:01<00:00, 14.85it/s]
Fusing layers...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 18/18 [00:00<00:00, 106.18it/s]


The answer is in the center of the Earth.

The statement is a trick question, as the person is standing on the surface of the Earth and walking one mile south, one mile west and one mile north will not change their position.

It looks correct.

@TechxGenus
Copy link
Contributor Author

I reproduced this error when running gemma-7b-it-AWQ, though gemma-2b-AWQ works well.

Additionally, I discovered that the latest transformers seem to modify the implementation of model.generate, and previous fusion layers needed to be modified to work. I tested TheBloke/Llama-2-7B-AWQ and after modification its output was consistent with not using the fusion layer (more testing needed).

@TechxGenus
Copy link
Contributor Author

I fixed the error and should be able to generate results normally now.

@casper-hansen
Copy link
Owner

Excellent work @TechxGenus. Thanks for your contribution.

@casper-hansen casper-hansen merged commit 94e73f0 into casper-hansen:main Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants