Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model] Add PaliGemma #5189

Merged
merged 41 commits into from
Jul 7, 2024
Merged

[Model] Add PaliGemma #5189

merged 41 commits into from
Jul 7, 2024

Conversation

ywang96
Copy link
Member

@ywang96 ywang96 commented Jun 2, 2024

Add support for PaliGemma - Google's Cutting-Edge Open Vision Language Model.

FIX #4833

@ywang96
Copy link
Member Author

ywang96 commented Jun 2, 2024

Not ready for review yet but FYI @zhuohan123 @simon-mo since you asked me about this.

also cc @DarkLight1337 since I might need you to review this eventually

@ywang96 ywang96 marked this pull request as ready for review June 9, 2024 03:10
@ywang96
Copy link
Member Author

ywang96 commented Jun 10, 2024

Seeing some correctness issue with the implementation of Gemma itself - going to mark this PR back to draft for now.

@ywang96 ywang96 marked this pull request as draft June 10, 2024 07:04
input_id for input_id in input_ids
if input_id != image_token_id and input_id != tokenizer.bos_token_id
]
if hf_input_ids[-1] == 108:
Copy link
Member

@DarkLight1337 DarkLight1337 Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe indicate what this 108 means.

@DarkLight1337
Copy link
Member

Design-wise I think the code is fine. However, there appear to be major discrepancies between the expected and actual output beyond what can be observed for the base Gemma model. For example, the caption es example is not working at all (the model returns English output instead of Spanish).

@WoosukKwon
Copy link
Collaborator

@ywang96 Did you have a chance to test the model again after #5913?

@ywang96
Copy link
Member Author

ywang96 commented Jul 5, 2024

@ywang96 Did you have a chance to test the model again after #5913?

@WoosukKwon I haven't, but I also need to update this PR because of the refactoring we just finished to make sure it works.

@ywang96
Copy link
Member Author

ywang96 commented Jul 6, 2024

This PR passed all tests locally and is ready for review. Please take a look @WoosukKwon!

@ywang96 ywang96 marked this pull request as ready for review July 6, 2024 03:36
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM but I'll check the warnings outputted in the VLM tests to see how consistent it is with the HF model.

vllm/model_executor/models/paligemma.py Outdated Show resolved Hide resolved
vllm/model_executor/models/paligemma.py Outdated Show resolved Hide resolved
@DarkLight1337
Copy link
Member

Seems that the vLLM output string is not following HF format:

https://buildkite.com/vllm/ci-aws/builds/4150#0190861d-cafe-41db-81ba-cb4c0d8fc2a6

Can you fix that?

@ywang96
Copy link
Member Author

ywang96 commented Jul 6, 2024

Seems that the vLLM output string is not following HF format:

https://buildkite.com/vllm/ci-aws/builds/4150#0190861d-cafe-41db-81ba-cb4c0d8fc2a6

Can you fix that?

Ah it's the eos_token. It's also weird that some of these tests actually don't give an output at all - I guess I will need to take a deeper look at it and see what went wrong.

Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ywang96 Thanks for the PR! The PR looks good overall, but I think we can reduce the redundancy by reusing the code for Llava.

docs/source/models/supported_models.rst Outdated Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this example? I'm wondering because it seems pretty similar to the llava example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests and examples were made for each model previously because the image-related engine args were different from each model. This is indeed no longer needed after the refactoring we did, and I will open another PR for consolidate them if you're okay with that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation! Sounds good. Please open the PR!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test is also a bit redundant with test_llava.py. Can we refactor test_llava.py to cover both models?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above.

Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ywang96 LGTM. Please feel free to open the PR for refactoring.

Also, please merge the PR after @DarkLight1337 also approves as he might be more knowledgeable about the models than me.

@DarkLight1337
Copy link
Member

Although the vLLM output for Spanish translation has some numerical difference and diverges from HF output after the first 2 tokens, I think the overall meaning is similar enough so it is fine. Let's merge this.

@DarkLight1337 DarkLight1337 merged commit 6206dcb into vllm-project:main Jul 7, 2024
70 checks passed
xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[New Model]: Google's Paligemma family of models
3 participants