Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax NumPy requirement in UCX #3731

Merged
merged 15 commits into from
Apr 21, 2020
Merged

Conversation

jakirkham
Copy link
Member

This relaxes the NumPy requirement in UCX communication. We do this by leveraging things like struct.pack and struct.unpack for metadata about the frames, which gives us simple bytes objects to send over the wire. Also we create a host_array function that will use NumPy when available, but will fallback to things like bytearray if not.

While it works to have this be a single `int` (as it will be coerced to
a `tuple`), go ahead and make it a `tuple` for clarity and to match more
closely to the Numba case.
This is equivalent to using NumPy's `uint8`, but has the added benefit
of not requiring NumPy be imported to work.
Matches the variable name in the `send` case to make things easier to
follow.
As `struct.pack` and `struct.unpack` are able to build `bytes` objects
containing the frame metadata needed by UCX easily, just use these
functions instead of creating NumPy arrays each time. Helps soften the
NumPy requirement a bit.
Matches more closely to the name used by RMM and Numba.
To relax the NumPy requirement completely, add a function to allocate
arrays on host. If NumPy is not present, this falls back to just
allocating `bytearray` objects, which work just as well.
@quasiben
Copy link
Member

Is there any expected performance improvement by using struct vs numpy ?

cc @madsbk

@jakirkham
Copy link
Member Author

Mostly hoping this simpler implementation avoids red herrings while debugging issues.

@jakirkham
Copy link
Member Author

In terms of performance this does improve things a little bit, but it probably doesn't impact the overall send/receive process meaningfully (though I could be wrong about that). Again this isn't really the motivation for it.

In [1]: import struct                                                           

In [2]: import numpy                                                            

In [3]: l = 4 * [True, False]                                                   

In [4]: %timeit struct.pack(len(l) * "?", *l)                                   
389 ns ± 4.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [5]: %timeit numpy.array(l)                                                  
838 ns ± 2.67 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Copy link
Contributor

@madsbk madsbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Avoids multiple calls to `len(frames)`, is a bit easier to read, and
matches the receive code path more closely.
To send fewer and larger messages, pack both which frames are on device
and how large each frame is into one message.
@quasiben
Copy link
Member

quasiben commented Apr 21, 2020 via email

@jakirkham
Copy link
Member Author

jakirkham commented Apr 21, 2020

Went ahead and grabbed the header consolidation commit from PR ( #3732 ) and pushed it here. That doesn't really change how the frames are serialized, but it does allow us to send all of the per frame metadata in one message.

@quasiben
Copy link
Member

test_ucx.py passed locally with one exception:

distributed/comm/tests/test_ucx.py::test_ucx_deserialize FAILED

though this is probably due to hostname issues.

While I don't think this changes the overall scheme described in the notes, lines like:

struct.pack(nframes * "?" + nframes * "Q", *cuda_frames, *sizes)

will be hard to read tomorrow. Would you mind adding a note or two describe what packing is doing ?

@jakirkham jakirkham force-pushed the relax_numpy_req_ucx branch 2 times, most recently from 19aabd7 to 4cc672e Compare April 21, 2020 20:23
@quasiben
Copy link
Member

Thank you @jakirkham for taking the time to add those comments

@jakirkham
Copy link
Member Author

Thanks Ben! 😄

Yeah I've never gotten that test to run successfully. 😞 Probably we should figure out how the config issue should be fixed and talk to OPS so it can be integrated into their provisioning scripts.

Was just about to ask if those comments seemed reasonable. Happy to adjust if needed 🙂

Copy link
Member

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Feel free to merge when you're ready.

@jakirkham jakirkham merged commit 6db09f3 into dask:master Apr 21, 2020
@jakirkham jakirkham deleted the relax_numpy_req_ucx branch April 21, 2020 23:35
@jakirkham
Copy link
Member Author

Thanks all! 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants