Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core][1/N] Support PP PyNCCL Groups #4988

Merged
merged 4 commits into from
May 23, 2024
Merged

Conversation

andoorve
Copy link
Contributor

This PR adds supports multiple PP groups. Will help #4412 support CUDAGraph.

Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
@andoorve
Copy link
Contributor Author

@youkaichao @simon-mo Can you take a quick look at this PR?

@@ -151,6 +152,68 @@ def test_pynccl_with_cudagraph():
distributed_run(worker_fn_with_cudagraph, 2)


@worker_fn_wrapper
def pp_worker_fn():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name it as send_recv worker? it is not testing pp though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can also change that into allreduce worker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 137 to 138
# Hardcoded to send to the next rank for now for simplicity.
dst = (self.rank + 1) % self.world_size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why hardcode here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hardcoded as we only need to send to the next and prev rank for now so it's simple. We can change this if necessary though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't hardcode here. just expose the general form of send/recv. You can add default arg.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to expose general form with default arg None, which will make it next and prev rank respectively.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if that's sufficient

Comment on lines 151 to 152
# Hardcoded to receive from the previous rank for now for simplicity.
src = (self.rank - 1) % self.world_size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Comment on lines 219 to 223
_PP_PYNCCL_COMMUNICATOR = PyNcclCommunicator(
group=_PP_CPU_GROUP,
device=_LOCAL_RANK,
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

creating communicators will add memory cost. please add a check here, if pp size == 1, skip the construction of PyNcclCommunicator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed both PP and TP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@youkaichao Do you know how much extra memory the communicators would be adding? It is a fixed cost or scales with group size?

We are running into some OOM with cudagraph enabled that shouldn't be happening. Any pointers would be greatly appreciated!

Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Copy link
Member

@youkaichao youkaichao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me overall. thanks! let's see if it passes the tests.

@andoorve
Copy link
Contributor Author

Hey @youkaichao looks like it passed, can this be merged?

@youkaichao youkaichao merged commit 5eda2ea into vllm-project:main May 23, 2024
63 checks passed
@andoorve andoorve deleted the pp-pynccl branch May 23, 2024 17:08
dtrifiro pushed a commit to opendatahub-io/vllm that referenced this pull request May 31, 2024
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
robertgshaw2-neuralmagic pushed a commit to neuralmagic/nm-vllm that referenced this pull request Jun 8, 2024
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
joerunde pushed a commit to joerunde/vllm that referenced this pull request Jun 17, 2024
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
robertgshaw2-neuralmagic pushed a commit to neuralmagic/nm-vllm that referenced this pull request Jul 14, 2024
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants