[Model] Expose Phi3v num_crops as a mm_processor_kwarg #8658

alex-jw-brooks · 2024-09-20T07:53:20Z

FIX #7861. This PR should be merged after #8657; it exposes num_crops as a processor_kwarg for phi3v models (see last commit) and adds a bunch of tests to ensure it's properly handled everywhere in:

The max token count
The dummy data
The input processor
The default input mapper (which wraps the default HF image processor)

Examples

Offline batch inference

from vllm import LLM, SamplingParams
from vllm.assets.image import ImageAsset

question = "What is the content of this image?"
prompt = f"<|user|>\n<|image_1|>\n{question}<|end|>\n<|assistant|>\n"
image = ImageAsset("cherry_blossom").pil_image.convert("RGB")

llm = LLM(
    model="microsoft/Phi-3-vision-128k-instruct",
    trust_remote_code=True,
    max_num_seqs=5,
    mm_processor_kwargs={"num_crops": 4}
)

sampling_params = SamplingParams(temperature=0.2, max_tokens=64)

outputs = llm.generate(
    {
        "prompt": prompt,
        "multi_modal_data": {"image": image}
    }, 
    sampling_params=sampling_params
)

for o in outputs:
    generated_text = o.outputs[0].text
    print(generated_text)

Sample response: The image captures a serene scene of a tall, white tower with a golden dome, standing majestically against a clear blue sky. The tower is partially obscured by a tree adorned with pink cherry blossoms, adding a touch of nature's beauty to the urban landscape.
The example for offline inference for phi3v has also been updated to pass it in case users end up looking at it.

Through the server

python vllm/entrypoints/openai/api_server.py \
    --device cuda \
    --model microsoft/Phi-3-vision-128k-instruct \
    --tokenizer microsoft/Phi-3-vision-128k-instruct \
    --trust-remote-code \
    --api-key token-abc123 \
    --max_model_len 32000 \
    --disable-frontend-multiprocessing \
    --mm_processor_kwargs '{"num_crops": 4}' &

Client:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="token-abc123")

completion = client.chat.completions.create(
  model="microsoft/Phi-3-vision-128k-instruct",
  messages=[
    {
        "role": "user", "content": [
          {"type": "image_url", "image_url": {"url": "https://github.com/haotian-liu/LLaVA/blob/1a91fc274d7c35a9b50b3cb29c4247ae5837ce39/images/llava_v1_5_radar.jpg?raw=true"}},
          {"type": "text", "text": "Describe this image. "},
        ]
    }
  ]
)

print(completion.choices[0].message)

ChatCompletionMessage(content=" The radar chart displays performance metrics for four different models across various evaluation datasets. Each axis represents a dataset, with metrics such as 'BLEU-4', 'MME', 'PPE', 'SMOG', 'TED-LIUM', 'VLEN', 'Vocabulary', and 'Word Error Rate' ranging from 0 to 100. The chart includes the following models: 'BLIP-2', 'InstructBLIP', 'Qwen-VL-Chat', and 'LLaVA-1.5'. Each model has a distinct line and color on the chart, with their performance in each dataset marked along the corresponding axis. Data points are annotated with their values. The chart is titled 'Performance Metrics of Translation Models on NLP Datasets' and includes a legend for model identification.", refusal=None, role='assistant', function_call=None, tool_calls=[])

PR Checklist (Click to Expand)

Thank you for your contribution to vLLM! Before submitting the pull request, please ensure the PR meets the following criteria. This helps vLLM maintain the code quality and improve the efficiency of the review process.

PR Title and Classification

Only specific types of PRs will be reviewed. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:

[Bugfix] for bug fixes.
[CI/Build] for build or continuous integration improvements.
[Doc] for documentation fixes and improvements.
[Model] for adding a new model or improving an existing model. Model name should appear in the title.
[Frontend] For changes on the vLLM frontend (e.g., OpenAI API server, LLM class, etc.)
[Kernel] for changes affecting CUDA kernels or other compute kernels.
[Core] for changes in the core vLLM logic (e.g., LLMEngine, AsyncLLMEngine, Scheduler, etc.)
[Hardware][Vendor] for hardware-specific changes. Vendor name should appear in the prefix (e.g., [Hardware][AMD]).
[Misc] for PRs that do not fit the above categories. Please use this sparingly.

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

We adhere to Google Python style guide and Google C++ style guide.
Pass all linter checks. Please use format.sh to format your code.
The code need to be well-documented to ensure future contributors can easily understand the code.
Include sufficient tests to ensure the project to stay correct and robust. This includes both unit tests and integration tests.
Please add documentation to docs/source/ if the PR modifies the user-facing behaviors of vLLM. It helps vLLM user understand and utilize the new features or changes.

Adding or changing kernels

Each custom kernel needs a schema and one or more implementations to be registered with PyTorch.

Make sure custom ops are registered following PyTorch guidelines: Custom C++ and CUDA Operators and The Custom Operators Manual
Custom operations that return Tensors require meta-functions. Meta-functions should be implemented and registered in python so that dynamic dims can be handled automatically. See above documents for a description of meta-functions.
Use torch.libary.opcheck() to test the function registration and meta-function for any registered ops. See tests/kernels for examples.
When changing the C++ signature of an existing op, the schema must be updated to reflect the changes.
If a new custom type is needed, see the following document: Custom Class Support in PT2.

Notes for Large Changes

Please keep the changes as concise as possible. For major architectural changes (>500 LOC excluding kernel/data/config/test), we would expect a GitHub issue (RFC) discussing the technical design and justification. Otherwise, we will tag it with rfc-required and might not go through the PR.

What to Expect for the Reviews

The goal of the vLLM team is to be a transparent reviewing machine. We would like to make the review process transparent and efficient and make sure no contributor feel confused or frustrated. However, the vLLM team is small, so we need to prioritize some PRs over others. Here is what you can expect from the review process:

After the PR is submitted, the PR will be assigned to a reviewer. Every reviewer will pick up the PRs based on their expertise and availability.
After the PR is assigned, the reviewer will provide status update every 2-3 days. If the PR is not reviewed within 7 days, please feel free to ping the reviewer or the vLLM team.
After the review, the reviewer will put an action-required label on the PR if there are changes required. The contributor should address the comments and ping the reviewer to re-review the PR.
Please respond to all comments within a reasonable time frame. If a comment isn't clear or you disagree with a suggestion, feel free to ask for clarification or discuss the suggestion.

Thank You

Finally, thank you for taking the time to read these guidelines and for your interest in contributing to vLLM. Your contributions make vLLM a great tool for everyone!

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

github-actions · 2024-09-20T07:53:33Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

DarkLight1337 · 2024-09-20T09:53:50Z

Marking this as draft to make it clear that #8657 should be merged first.

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Add tests for processing_kwarg overrides in phi3v Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Add processor_kwargs override to phi3v offline inference Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> rename processor kwargs to mm processor kwargs Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

DarkLight1337 · 2024-09-23T12:35:46Z

examples/offline_inference_vision_language.py

@@ -87,6 +87,7 @@ def run_phi3v(question, modality):
        model="microsoft/Phi-3-vision-128k-instruct",
        trust_remote_code=True,
        max_num_seqs=5,
+        processor_kwargs={"num_crops": 16},


Can you link to the HF repo explaining how to use num_crops?

Please update the multi-image input example as well.

DarkLight1337 · 2024-09-23T12:39:33Z

Otherwise the PR looks good to me (the tests pass locally). You can mark the PR as ready once you have addressed the above comment.

alex-jw-brooks · 2024-09-23T23:46:58Z

Awesome, sounds good, thanks @DarkLight1337! Updated both examples with a short description about num_crops.

The values are based on the docs for Phi-3.5-vision-instruct - for Phi-3-vision-128k-instruct(what we use in the single image example) it doesn't explicitly say in the README, but uses the value recommended by Phi-3.5-vision-instruct for single frame in its config, so it seems reasonable to use the same recommended values to me

* [Kernel] Enable 8-bit weights in Fused Marlin MoE (vllm-project#8032) Co-authored-by: Dipika <dipikasikka1@gmail.com> * [Frontend] Expose revision arg in OpenAI server (vllm-project#8501) * [BugFix] Fix clean shutdown issues (vllm-project#8492) * [Bugfix][Kernel] Fix build for sm_60 in GGUF kernel (vllm-project#8506) * [Kernel] AQ AZP 3/4: Asymmetric quantization kernels (vllm-project#7270) * [doc] update doc on testing and debugging (vllm-project#8514) * [Bugfix] Bind api server port before starting engine (vllm-project#8491) * [perf bench] set timeout to debug hanging (vllm-project#8516) * [misc] small qol fixes for release process (vllm-project#8517) * [Bugfix] Fix 3.12 builds on main (vllm-project#8510) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> * [refactor] remove triton based sampler (vllm-project#8524) * [Frontend] Improve Nullable kv Arg Parsing (vllm-project#8525) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * [Misc][Bugfix] Disable guided decoding for mistral tokenizer (vllm-project#8521) * [torch.compile] register allreduce operations as custom ops (vllm-project#8526) * [Misc] Limit to ray[adag] 2.35 to avoid backward incompatible change (vllm-project#8509) Signed-off-by: Rui Qiao <ruisearch42@gmail.com> * [Benchmark] Support sample from HF datasets and image input for benchmark_serving (vllm-project#8495) * [Encoder decoder] Add cuda graph support during decoding for encoder-decoder models (vllm-project#7631) * [Feature][kernel] tensor parallelism with bitsandbytes quantization (vllm-project#8434) * [Model] Add mistral function calling format to all models loaded with "mistral" format (vllm-project#8515) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> * [Misc] Don't dump contents of kvcache tensors on errors (vllm-project#8527) * [Bugfix] Fix TP > 1 for new granite (vllm-project#8544) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> * [doc] improve installation doc (vllm-project#8550) Co-authored-by: Andy Dai <76841985+Imss27@users.noreply.github.com> * [CI/Build] Excluding kernels/test_gguf.py from ROCm (vllm-project#8520) * [Kernel] Change interface to Mamba causal_conv1d_update for continuous batching (vllm-project#8012) * [CI/Build] fix Dockerfile.cpu on podman (vllm-project#8540) * [Misc] Add argument to disable FastAPI docs (vllm-project#8554) * [CI/Build] Avoid CUDA initialization (vllm-project#8534) * [CI/Build] Update Ruff version (vllm-project#8469) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> * [Core][Bugfix][Perf] Introduce `MQLLMEngine` to avoid `asyncio` OH (vllm-project#8157) Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com> Co-authored-by: Simon Mo <simon.mo@hey.com> * [Core] *Prompt* logprobs support in Multi-step (vllm-project#8199) * [Core] zmq: bind only to 127.0.0.1 for local-only usage (vllm-project#8543) Signed-off-by: Russell Bryant <rbryant@redhat.com> * [Model] Support Solar Model (vllm-project#8386) Co-authored-by: Michael Goin <michael@neuralmagic.com> * [AMD][ROCm]Quantization methods on ROCm; Fix _scaled_mm call (vllm-project#8380) Co-authored-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> * [Kernel] Change interface to Mamba selective_state_update for continuous batching (vllm-project#8039) * [BugFix] Nonzero exit code if MQLLMEngine startup fails (vllm-project#8572) * [Bugfix] add `dead_error` property to engine client (vllm-project#8574) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> * [Kernel] Remove marlin moe templating on thread_m_blocks (vllm-project#8573) Co-authored-by: lwilkinson@neuralmagic.com * [Bugfix] [Encoder-Decoder] Bugfix for encoder specific metadata construction during decode of encoder-decoder models. (vllm-project#8545) * Revert "[Misc][Bugfix] Disable guided decoding for mistral tokenizer" (vllm-project#8593) * [Bugfix] fixing sonnet benchmark bug in benchmark_serving.py (vllm-project#8616) * [MISC] remove engine_use_ray in benchmark_throughput.py (vllm-project#8615) * [Frontend] Use MQLLMEngine for embeddings models too (vllm-project#8584) * [Kernel][Amd] Add fp8 kv cache support for rocm custom paged attention (vllm-project#8577) * [Core] simplify logits resort in _apply_top_k_top_p (vllm-project#8619) * [Doc] Add documentation for GGUF quantization (vllm-project#8618) * Create SECURITY.md (vllm-project#8642) * [CI/Build] Re-enabling Entrypoints tests on ROCm, excluding ones that fail (vllm-project#8551) * [Misc] guard against change in cuda library name (vllm-project#8609) * [Bugfix] Fix Phi3.5 mini and MoE LoRA inference (vllm-project#8571) * [bugfix] [AMD] add multi-step advance_step to ROCmFlashAttentionMetadata (vllm-project#8474) * [Core] Support Lora lineage and base model metadata management (vllm-project#6315) * [Model] Add OLMoE (vllm-project#7922) * [CI/Build] Removing entrypoints/openai/test_embedding.py test from ROCm build (vllm-project#8670) * [Bugfix] Validate SamplingParam n is an int (vllm-project#8548) * [Misc] Show AMD GPU topology in `collect_env.py` (vllm-project#8649) * [Bugfix] Config got an unexpected keyword argument 'engine' (vllm-project#8556) * [Bugfix][Core] Fix tekken edge case for mistral tokenizer (vllm-project#8640) * [Doc] neuron documentation update (vllm-project#8671) Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com> * [Hardware][AWS] update neuron to 2.20 (vllm-project#8676) Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com> * [Bugfix] Fix incorrect llava next feature size calculation (vllm-project#8496) * [Core] Rename `PromptInputs` and `inputs`(vllm-project#8673) * [MISC] add support custom_op check (vllm-project#8557) Co-authored-by: youkaichao <youkaichao@126.com> * [Core] Factor out common code in `SequenceData` and `Sequence` (vllm-project#8675) * [beam search] add output for manually checking the correctness (vllm-project#8684) * [Kernel] Build flash-attn from source (vllm-project#8245) * [VLM] Use `SequenceData.from_token_counts` to create dummy data (vllm-project#8687) * [Doc] Fix typo in AMD installation guide (vllm-project#8689) * [Kernel][Triton][AMD] Remove tl.atomic_add from awq_gemm_kernel, 2-5x speedup MI300, minor improvement for MI250 (vllm-project#8646) * [dbrx] refactor dbrx experts to extend FusedMoe class (vllm-project#8518) * [Kernel][Bugfix] Delete some more useless code in marlin_moe_ops.cu (vllm-project#8643) * [Bugfix] Refactor composite weight loading logic (vllm-project#8656) * [ci][build] fix vllm-flash-attn (vllm-project#8699) * [Model] Refactor BLIP/BLIP-2 to support composite model loading (vllm-project#8407) * [Misc] Use NamedTuple in Multi-image example (vllm-project#8705) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * [MISC] rename CudaMemoryProfiler to DeviceMemoryProfiler (vllm-project#8703) * [Model][VLM] Add LLaVA-Onevision model support (vllm-project#8486) Co-authored-by: litianjian <litianjian@bytedance.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> * [SpecDec][Misc] Cleanup, remove bonus token logic. (vllm-project#8701) * [build] enable existing pytorch (for GH200, aarch64, nightly) (vllm-project#8713) * [misc] upgrade mistral-common (vllm-project#8715) * [Bugfix] Avoid some bogus messages RE CUTLASS's revision when building (vllm-project#8702) * [Bugfix] Fix CPU CMake build (vllm-project#8723) Co-authored-by: Yuan <yuan.zhou@intel.com> * [Bugfix] fix docker build for xpu (vllm-project#8652) * [Core][Frontend] Support Passing Multimodal Processor Kwargs (vllm-project#8657) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * [Hardware][CPU] Refactor CPU model runner (vllm-project#8729) * [Bugfix][CPU] fix missing input intermediate_tensors in the cpu_model_runner (vllm-project#8733) * [Model] Support pp for qwen2-vl (vllm-project#8696) * [VLM] Fix paligemma, fuyu and persimmon with transformers 4.45 : use config.text_config.vocab_size (vllm-project#8707) * [CI/Build] use setuptools-scm to set __version__ (vllm-project#4738) Co-authored-by: youkaichao <youkaichao@126.com> * [Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin (vllm-project#7701) Co-authored-by: mgoin <michael@neuralmagic.com> Co-authored-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> * [Kernel][LoRA] Add assertion for punica sgmv kernels (vllm-project#7585) * [Core] Allow IPv6 in VLLM_HOST_IP with zmq (vllm-project#8575) Signed-off-by: Russell Bryant <rbryant@redhat.com> * Fix typical acceptance sampler with correct recovered token ids (vllm-project#8562) * Add output streaming support to multi-step + async while ensuring RequestOutput obj reuse (vllm-project#8335) * [Hardware][AMD] ROCm6.2 upgrade (vllm-project#8674) * Fix tests in test_scheduler.py that fail with BlockManager V2 (vllm-project#8728) * re-implement beam search on top of vllm core (vllm-project#8726) Co-authored-by: Brendan Wong <bjwpokemon@gmail.com> * Revert "[Core] Rename `PromptInputs` to `PromptType`, and `inputs` to `prompt`" (vllm-project#8750) * [MISC] Skip dumping inputs when unpicklable (vllm-project#8744) * [Core][Model] Support loading weights by ID within models (vllm-project#7931) * [Model] Expose Phi3v num_crops as a mm_processor_kwarg (vllm-project#8658) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Bugfix] Fix potentially unsafe custom allreduce synchronization (vllm-project#8558) * [Kernel] Split Marlin MoE kernels into multiple files (vllm-project#8661) Co-authored-by: mgoin <michael@neuralmagic.com> * [Frontend] Batch inference for llm.chat() API (vllm-project#8648) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> * [Bugfix] Fix torch dynamo fixes caused by `replace_parameters` (vllm-project#8748) * [CI/Build] fix setuptools-scm usage (vllm-project#8771) * [misc] soft drop beam search (vllm-project#8763) * [[Misc]Upgrade bitsandbytes to the latest version 0.44.0 (vllm-project#8768) * [Core][Bugfix] Support prompt_logprobs returned with speculative decoding (vllm-project#8047) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> * [Core] Adding Priority Scheduling (vllm-project#5958) * [Bugfix] Use heartbeats instead of health checks (vllm-project#8583) * Fix test_schedule_swapped_simple in test_scheduler.py (vllm-project#8780) * [Bugfix][Kernel] Implement acquire/release polyfill for Pascal (vllm-project#8776) * Fix tests in test_chunked_prefill_scheduler which fail with BlockManager V2 (vllm-project#8752) * [BugFix] Propagate 'trust_remote_code' setting in internvl and minicpmv (vllm-project#8250) * [Hardware][CPU] Enable mrope and support Qwen2-VL on CPU backend (vllm-project#8770) * [Bugfix] load fc bias from config for eagle (vllm-project#8790) --------- Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Signed-off-by: Rui Qiao <ruisearch42@gmail.com> Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com> Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: ElizaWszola <eliza@neuralmagic.com> Co-authored-by: Dipika <dipikasikka1@gmail.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: sasha0552 <admin@sasha0552.org> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Kevin Lin <42618777+kevin314@users.noreply.github.com> Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Alex Brooks <alex.brooks@ibm.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: Rui Qiao <161574667+ruisearch42@users.noreply.github.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: sroy745 <142070531+sroy745@users.noreply.github.com> Co-authored-by: chenqianfzh <51831990+chenqianfzh@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Andy Dai <76841985+Imss27@users.noreply.github.com> Co-authored-by: Alexey Kondratiev(AMD) <143633163+alexeykondrat@users.noreply.github.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Daniele <36171005+dtrifiro@users.noreply.github.com> Co-authored-by: Jiaxin Shan <seedjeffwan@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Alexander Matveev <59768536+alexm-neuralmagic@users.noreply.github.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com> Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Geun, Lim <shing100@Naver.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com> Co-authored-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com> Co-authored-by: Kuntai Du <kuntai@uchicago.edu> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Charlie Fu <charlifu@amd.com> Co-authored-by: 盏一 <w@hidva.com> Co-authored-by: bnellnm <49004751+bnellnm@users.noreply.github.com> Co-authored-by: Amit Garg <mitgarg17495@gmail.com> Co-authored-by: William Lin <SolitaryThinker@users.noreply.github.com> Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> Co-authored-by: saumya-saran <saumya.saran@c3.ai> Co-authored-by: Pastel！ <1627301104@qq.com> Co-authored-by: omrishiv <327609+omrishiv@users.noreply.github.com> Co-authored-by: zyddnys <zyddnys@outlook.com> Co-authored-by: youkaichao <youkaichao@126.com> Co-authored-by: rasmith <Randall.Smith@amd.com> Co-authored-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com> Co-authored-by: Huazhong Ji <hzji210@gmail.com> Co-authored-by: litianjian <45817262+litianjian@users.noreply.github.com> Co-authored-by: litianjian <litianjian@bytedance.com> Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: Lily Liu <lilyliupku@gmail.com> Co-authored-by: Yuan <yuan.zhou@intel.com> Co-authored-by: Yan Ma <yan.ma@intel.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com> Co-authored-by: Yanyi Liu <wolfsonliu@163.com> Co-authored-by: Jani Monoses <jani.monoses@gmail.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: jiqing-feng <107918818+jiqing-feng@users.noreply.github.com> Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com> Co-authored-by: Brendan Wong <bjwpokemon@gmail.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Co-authored-by: Peter Salas <peter@fixie.ai> Co-authored-by: Hanzhi Zhou <hanzhi713@gmail.com> Co-authored-by: Andy <37781802+aandyw@users.noreply.github.com> Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Archit Patke <apatke@illinois.edu> Co-authored-by: zifeitong <zifeitong@gmail.com> Co-authored-by: sohamparikh <sohamparikh47@gmail.com>

…8658) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>

alex-jw-brooks added 21 commits September 20, 2024 03:17

Allow for processor kwarg overrides

550378b

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

Pass processor through to partial

190606f

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

Add default & processor kwarg override tests

b1ca041

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

Don't allow ctx or inputs as kwargs

195e31c

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

Add kwarg override for processor to dummy data factories

1472d04

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

Add kwarg override forr processor to max token calc

f10601f

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

Move kwarg only override func to utils

429097a

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

Force processor kwargs to be keyword-only

159cfc2

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

Pass unfiltered processor kwargs to default mapper

af91930

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

Add hack for mapper preprocessor kwargs

9adad10

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

Simplify dummy data processor kwarg & add tests

9f7aed8

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

Add tests for max multimodal token kwarg overrides

ff59e44

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

Format registry

6b26454

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

Fix default mapper comparison

0e2d53d

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

Move kwarg filtering into hf processor getter

5a3341b

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

Enable processor_kwargs in video processor

3e1fe54

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

Add tests for mapper processor_kwargs

feccfd7

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

Update mapper not on multimodal processor kwargs

3ada64d

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

processor kwarg test cleanup

58dcc63

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

Move context builder to test utils

1cee215

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

Use common context builder in processor kwarg tests

d5f9efa

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

This was referenced Sep 20, 2024

[Core][Frontend] Support Passing Multimodal Processor Kwargs #8657

Merged

[Frontend][Core] passing hf_config args through openai server #5836

Open

alex-jw-brooks force-pushed the phi3v_num_crops branch from 7f29a24 to adde776 Compare September 20, 2024 09:21

DarkLight1337 self-assigned this Sep 20, 2024

DarkLight1337 marked this pull request as draft September 20, 2024 09:53

alex-jw-brooks and others added 2 commits September 22, 2024 00:05

Update vllm/entrypoints/llm.py

b5d434b

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

Update vllm/inputs/registry.py

a096301

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

alex-jw-brooks and others added 5 commits September 22, 2024 00:06

Update vllm/inputs/registry.py

79962e0

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

Update vllm/inputs/registry.py

2cb1f72

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

Update vllm/inputs/registry.py

37eb532

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

Update vllm/inputs/registry.py

a4c7c3d

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

Fix formatting

36dd2cb

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

alex-jw-brooks force-pushed the phi3v_num_crops branch from adde776 to e3d1014 Compare September 22, 2024 06:15

alex-jw-brooks added 2 commits September 22, 2024 04:29

Rename processor kwargs to mm processor kwargs

f95c86f

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

alex-jw-brooks force-pushed the phi3v_num_crops branch from e3d1014 to 632dac1 Compare September 22, 2024 08:44

alex-jw-brooks changed the title ~~[Model] Expose Phi3v num_crops as a processor_kwarg~~ [Model] Expose Phi3v num_crops as a mm_processor_kwarg Sep 22, 2024

DarkLight1337 added 2 commits September 23, 2024 12:29

Merge branch 'main' into phi3v_num_crops

9eca61a

Merge branch 'main' into phi3v_num_crops

a3ab6cb

DarkLight1337 reviewed Sep 23, 2024

View reviewed changes

alex-jw-brooks marked this pull request as ready for review September 23, 2024 23:40

Update phi3v examples with num crops overrides

4a9ccae

alex-jw-brooks force-pushed the phi3v_num_crops branch from 049dbe6 to 4a9ccae Compare September 23, 2024 23:55

DarkLight1337 approved these changes Sep 24, 2024

View reviewed changes

DarkLight1337 enabled auto-merge (squash) September 24, 2024 02:36

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 24, 2024

DarkLight1337 merged commit 8ff7ced into vllm-project:main Sep 24, 2024
66 checks passed

Isotr0py mentioned this pull request Sep 27, 2024

[Model] Initialize support for InternVL2 series models #6514

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Model] Expose Phi3v num_crops as a mm_processor_kwarg #8658

[Model] Expose Phi3v num_crops as a mm_processor_kwarg #8658

alex-jw-brooks commented Sep 20, 2024 •

edited

Loading

github-actions bot commented Sep 20, 2024

DarkLight1337 commented Sep 20, 2024

DarkLight1337 Sep 23, 2024

DarkLight1337 commented Sep 23, 2024

alex-jw-brooks commented Sep 23, 2024 •

edited

Loading

[Model] Expose Phi3v num_crops as a mm_processor_kwarg #8658

[Model] Expose Phi3v num_crops as a mm_processor_kwarg #8658

Conversation

alex-jw-brooks commented Sep 20, 2024 • edited Loading

Examples

PR Title and Classification

Code Quality

Adding or changing kernels

Notes for Large Changes

What to Expect for the Reviews

Thank You

github-actions bot commented Sep 20, 2024

DarkLight1337 commented Sep 20, 2024

DarkLight1337 Sep 23, 2024

Choose a reason for hiding this comment

DarkLight1337 commented Sep 23, 2024

alex-jw-brooks commented Sep 23, 2024 • edited Loading

alex-jw-brooks commented Sep 20, 2024 •

edited

Loading

alex-jw-brooks commented Sep 23, 2024 •

edited

Loading