[Core] Fix tracking of model forward time to the span traces in case of PP>1 #7440

sfc-gh-mkeralapura · 2024-08-12T20:42:07Z

This is a quick follow up to #7089. In that PR we left the PP>1 unsupported for the model forward time. Fixing that here.

PR Checklist (Click to Expand)

Thank you for your contribution to vLLM! Before submitting the pull request, please ensure the PR meets the following criteria. This helps vLLM maintain the code quality and improve the efficiency of the review process.

PR Title and Classification

Only specific types of PRs will be reviewed. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:

[Bugfix] for bug fixes.
[CI/Build] for build or continuous integration improvements.
[Doc] for documentation fixes and improvements.
[Model] for adding a new model or improving an existing model. Model name should appear in the title.
[Frontend] For changes on the vLLM frontend (e.g., OpenAI API server, LLM class, etc.)
[Kernel] for changes affecting CUDA kernels or other compute kernels.
[Core] for changes in the core vLLM logic (e.g., LLMEngine, AsyncLLMEngine, Scheduler, etc.)
[Hardware][Vendor] for hardware-specific changes. Vendor name should appear in the prefix (e.g., [Hardware][AMD]).
[Misc] for PRs that do not fit the above categories. Please use this sparingly.

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

We adhere to Google Python style guide and Google C++ style guide.
Pass all linter checks. Please use format.sh to format your code.
The code need to be well-documented to ensure future contributors can easily understand the code.
Include sufficient tests to ensure the project to stay correct and robust. This includes both unit tests and integration tests.
Please add documentation to docs/source/ if the PR modifies the user-facing behaviors of vLLM. It helps vLLM user understand and utilize the new features or changes.

Notes for Large Changes

Please keep the changes as concise as possible. For major architectural changes (>500 LOC excluding kernel/data/config/test), we would expect a GitHub issue (RFC) discussing the technical design and justification. Otherwise, we will tag it with rfc-required and might not go through the PR.

What to Expect for the Reviews

The goal of the vLLM team is to be a transparent reviewing machine. We would like to make the review process transparent and efficient and make sure no contributor feel confused or frustrated. However, the vLLM team is small, so we need to prioritize some PRs over others. Here is what you can expect from the review process:

After the PR is submitted, the PR will be assigned to a reviewer. Every reviewer will pick up the PRs based on their expertise and availability.
After the PR is assigned, the reviewer will provide status update every 2-3 days. If the PR is not reviewed within 7 days, please feel free to ping the reviewer or the vLLM team.
After the review, the reviewer will put an action-required label on the PR if there are changes required. The contributor should address the comments and ping the reviewer to re-review the PR.
Please respond to all comments within a reasonable time frame. If a comment isn't clear or you disagree with a suggestion, feel free to ask for clarification or discuss the suggestion.

Thank You

Finally, thank you for taking the time to read these guidelines and for your interest in contributing to vLLM. Your contributions make vLLM a great tool for everyone!

github-actions · 2024-08-12T20:42:21Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

Comment /ready on the PR
Add ready label to the PR
Enable auto-merge.

🚀

comaniac · 2024-08-12T21:05:39Z

cc @rkooo567

vllm/worker/model_runner.py

comaniac

LGTM. Leave to @rkooo567

rkooo567

QQ: is it possible to add unit tests for this?

rkooo567 · 2024-08-13T07:00:22Z

vllm/worker/model_runner.py

+                orig_model_forward_time = 0.0
+                if intermediate_tensors is not None:
+                    orig_model_forward_time = intermediate_tensors.tensors.get(
+                        "model_forward_time", torch.tensor(0.0)).item()


why do we store this to tensor? any way to just use cpu data?

as far as I can tell, the only thing passed from the pipeline workers is the IntermediateTensors in serialized form. Hence added it to that. Is there a wrapper object of some form that holds these ?

can you try a regular python object here to see if it works?

Done. It looks like the worker serializes a Dict[Str, Any], so it can serialize floats too.

sfc-gh-mkeralapura · 2024-08-13T15:38:03Z

QQ: is it possible to add unit tests for this?

Let me look into that. There is one just the reporting of these metrics and two the PP>1 case. Let me look into see how doable these are. I will circle back later in the day.

sfc-gh-mkeralapura · 2024-08-13T23:42:58Z

QQ: is it possible to add unit tests for this?

I could not figure out how to get a unittest for this part of the worker. I instead added a test in the overall tracing test to test for these detailed trace data. It does not test for the pp>1 case though.

please take a look.

rkooo567 · 2024-08-14T16:20:47Z

vllm/worker/model_runner.py

+                orig_model_forward_time = 0.0
+                if intermediate_tensors is not None:
+                    orig_model_forward_time = intermediate_tensors.tensors.get(
+                        "model_forward_time", torch.tensor(0.0)).item()


can you try a regular python object here to see if it works?

rkooo567 · 2024-08-14T16:21:10Z

tests/tracing/test_tracing.py

+    assert metrics.model_execute_time is None
+
+
+def test_traces_with_detailed_steps(trace_service):


Add the same test with pp= 2?

It was a bit more involved, but done.

sfc-gh-mkeralapura · 2024-08-15T20:27:46Z

@rkooo567 Are you comfortable with this PR on the whole ?

sfc-gh-mkeralapura · 2024-08-15T21:06:35Z

/ready

…ct#7440) [Core] Fix tracking of model forward time to the span traces in case of PP>1 (vllm-project#7440)

Fix tracking of model forward time in PP>1

d43fe90

sfc-gh-mkeralapura force-pushed the main branch from 91f253a to d43fe90 Compare August 13, 2024 00:02

comaniac reviewed Aug 13, 2024

View reviewed changes

vllm/worker/model_runner.py Show resolved Hide resolved

vllm/worker/model_runner.py Show resolved Hide resolved

sfc-gh-mkeralapura requested a review from comaniac August 13, 2024 04:20

comaniac approved these changes Aug 13, 2024

View reviewed changes

rkooo567 reviewed Aug 13, 2024

View reviewed changes

Add a test for detailed traces

ac3db8d

sfc-gh-mkeralapura force-pushed the main branch from 840e0b1 to ac3db8d Compare August 13, 2024 23:41

sfc-gh-mkeralapura requested a review from rkooo567 August 13, 2024 23:43

rkooo567 reviewed Aug 14, 2024

View reviewed changes

sfc-gh-mkeralapura requested a review from rkooo567 August 14, 2024 20:35

rkooo567 approved these changes Aug 15, 2024

View reviewed changes

rkooo567 enabled auto-merge (squash) August 15, 2024 20:43

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 15, 2024

Update the number of gpus used for the tracing test on buildkite

125cd07

auto-merge was automatically disabled August 15, 2024 21:58
Head branch was pushed to by a user without write access

sfc-gh-mkeralapura force-pushed the main branch from bec7bc6 to 125cd07 Compare August 15, 2024 21:58

Merge branch 'vllm-project:main' into main

dbcc9e0

sfc-gh-hazhang approved these changes Aug 16, 2024

View reviewed changes

zhisbug enabled auto-merge (squash) August 16, 2024 18:06

zhisbug disabled auto-merge August 16, 2024 20:35

youkaichao merged commit 93478b6 into vllm-project:main Aug 16, 2024
67 of 70 checks passed

kylesayrs pushed a commit to neuralmagic/vllm that referenced this pull request Aug 17, 2024

[Core] Fix tracking of model forward time in case of PP>1 (vllm-proje…

9a707fd

…ct#7440) [Core] Fix tracking of model forward time to the span traces in case of PP>1 (vllm-project#7440)

zifeitong pushed a commit to zifeitong/vllm that referenced this pull request Aug 20, 2024

[Core] Fix tracking of model forward time in case of PP>1 (vllm-proje…

395f984

…ct#7440) [Core] Fix tracking of model forward time to the span traces in case of PP>1 (vllm-project#7440)

fialhocoelho pushed a commit to opendatahub-io/vllm that referenced this pull request Aug 22, 2024

[Core] Fix tracking of model forward time in case of PP>1 (vllm-proje…

63c426d

…ct#7440) [Core] Fix tracking of model forward time to the span traces in case of PP>1 (vllm-project#7440)

omrishiv pushed a commit to omrishiv/vllm that referenced this pull request Aug 26, 2024

[Core] Fix tracking of model forward time in case of PP>1 (vllm-proje…

bfe9d14

…ct#7440) [Core] Fix tracking of model forward time to the span traces in case of PP>1 (vllm-project#7440)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] Fix tracking of model forward time to the span traces in case of PP>1 #7440

[Core] Fix tracking of model forward time to the span traces in case of PP>1 #7440

sfc-gh-mkeralapura commented Aug 12, 2024

github-actions bot commented Aug 12, 2024

comaniac commented Aug 12, 2024

comaniac left a comment

rkooo567 left a comment

rkooo567 Aug 13, 2024

sfc-gh-mkeralapura Aug 13, 2024

rkooo567 Aug 14, 2024

sfc-gh-mkeralapura Aug 14, 2024

sfc-gh-mkeralapura commented Aug 13, 2024

sfc-gh-mkeralapura commented Aug 13, 2024

rkooo567 Aug 14, 2024

rkooo567 Aug 14, 2024

sfc-gh-mkeralapura Aug 14, 2024

sfc-gh-mkeralapura commented Aug 15, 2024

sfc-gh-mkeralapura commented Aug 15, 2024

		assert metrics.model_execute_time is None


		def test_traces_with_detailed_steps(trace_service):

[Core] Fix tracking of model forward time to the span traces in case of PP>1 #7440

[Core] Fix tracking of model forward time to the span traces in case of PP>1 #7440

Conversation

sfc-gh-mkeralapura commented Aug 12, 2024

PR Title and Classification

Code Quality

Notes for Large Changes

What to Expect for the Reviews

Thank You

github-actions bot commented Aug 12, 2024

comaniac commented Aug 12, 2024

comaniac left a comment

Choose a reason for hiding this comment

rkooo567 left a comment

Choose a reason for hiding this comment

rkooo567 Aug 13, 2024

Choose a reason for hiding this comment

sfc-gh-mkeralapura Aug 13, 2024

Choose a reason for hiding this comment

rkooo567 Aug 14, 2024

Choose a reason for hiding this comment

sfc-gh-mkeralapura Aug 14, 2024

Choose a reason for hiding this comment

sfc-gh-mkeralapura commented Aug 13, 2024

sfc-gh-mkeralapura commented Aug 13, 2024

rkooo567 Aug 14, 2024

Choose a reason for hiding this comment

rkooo567 Aug 14, 2024

Choose a reason for hiding this comment

sfc-gh-mkeralapura Aug 14, 2024

Choose a reason for hiding this comment

sfc-gh-mkeralapura commented Aug 15, 2024

sfc-gh-mkeralapura commented Aug 15, 2024