-
-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add UCX Comm #2591
Add UCX Comm #2591
Conversation
* Stubs for classes
…distributed into ucx+data-handling
OK, for now I've turned off compression on all cuda data. That should stop us from splitting up large frames. It looks like currently there is a limit in ucx-py that keeps us under 2**31 bytes. At first this seems to be limited by the I've also added logic to not call ensure_bytes and |
This now works-ish (at least when combined with some of the work in rapidsai/dask-cuda#46). I would like to get this to a point where we could merge it somewhat quickly. I don't mind things being a little rough if they are well isolated into files that aren't in the mainline code path (files like ucx.py, cuda.py, cudf.py and so on). There are a few TODO's left in the actual logic that I suspect are left by @TomAugspurger . Is this something that you can look into this week Tom to see if they are still necessary? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just gave a quick look through. I'm not sure if I'll have time this week to actually verify the TODOs yet; need to check on where we are for the next pandas release first.
General question: do we want users to provide the prefix ucx://
or ucp://
?
# Workaround for hanging test in | ||
# pytest distributed/comm/tests/test_ucx.py::test_comm_objs -vs --count=2 | ||
# on the second time through. | ||
ucp._libs.ucp_py.reader_added = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Akshay-Venkatesh do you recall if this was resolved? Is it the same as https://github.com/Akshay-Venkatesh/ucx-py/issues/69, or different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomAugspurger I tested this yesterday and the issue hasn't been resolved. This still has to be fixed.
Thanks for the feedback @TomAugspurger . I think I can handle everything, I mostly wanted to get your thoughts on some of the comments. Thanks! |
I've removed the WIP label. Review appreciated. I think that this is safe to go in. |
OK, I'm merging this tomorrow if there are no further comments. |
I won't have time soon to look this over again, but based on my last review
I also think this is safe to go in.
…On Thu, May 30, 2019 at 9:25 AM Matthew Rocklin ***@***.***> wrote:
I've removed the WIP label. Review appreciated. I think that this is safe
to go in.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2591?email_source=notifications&email_token=AAKAOISDTFT25GT7376J7STPX7PVBA5CNFSM4HC4HRZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWSO7DA#issuecomment-497348492>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOIXTD4ITBJEBKWUJ5LLPX7PVBANCNFSM4HC4HRZQ>
.
|
🎉 |
Thank you @TomAugspurger, @quasiben, and @Akshay-Venkatesh for working on this. I'm sure that there is still plenty more to do here, but it will be nice to have this in master. |
* upstream/master: (58 commits) Add unknown pytest markers (dask#2764) Delay lookup of allowed failures. (dask#2761) Change address -> worker in ColumnDataSource for nbytes plot (dask#2755) Remove module state in Prometheus Handlers (dask#2760) Add stress test for UCX (dask#2759) Add nanny logs (dask#2744) Move some of the adaptive logic into the scheduler (dask#2735) Add SpecCluster.new_worker_spec method (dask#2751) Worker dashboard fixes (dask#2747) Add async context managers to scheduler/worker classes (dask#2745) Fix the resource key representation before sending graphs (dask#2716) (dask#2733) Allow user to configure whether workers are daemon. (dask#2739) Pin pytest >=4 with pip in appveyor and python 3.5 (dask#2737) Add Experimental UCX Comm (dask#2591) Close nannies gracefully (dask#2731) add kwargs to progressbars (dask#2638) Add back LocalCluster.__repr__. (dask#2732) Move bokeh module to dashboard (dask#2724) Close clusters at exit (dask#2730) Add SchedulerPlugin TaskState example (dask#2622) ...
* upstream/master: (43 commits) Add unknown pytest markers (dask#2764) Delay lookup of allowed failures. (dask#2761) Change address -> worker in ColumnDataSource for nbytes plot (dask#2755) Remove module state in Prometheus Handlers (dask#2760) Add stress test for UCX (dask#2759) Add nanny logs (dask#2744) Move some of the adaptive logic into the scheduler (dask#2735) Add SpecCluster.new_worker_spec method (dask#2751) Worker dashboard fixes (dask#2747) Add async context managers to scheduler/worker classes (dask#2745) Fix the resource key representation before sending graphs (dask#2716) (dask#2733) Allow user to configure whether workers are daemon. (dask#2739) Pin pytest >=4 with pip in appveyor and python 3.5 (dask#2737) Add Experimental UCX Comm (dask#2591) Close nannies gracefully (dask#2731) add kwargs to progressbars (dask#2638) Add back LocalCluster.__repr__. (dask#2732) Move bokeh module to dashboard (dask#2724) Close clusters at exit (dask#2730) Add SchedulerPlugin TaskState example (dask#2622) ...
header = bytes(header) | ||
if header: | ||
header = msgpack.loads(header, use_list=False, **msgpack_opts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you recall why this was needed? Was this due to the if header
line? Did msgpack.dumps
need this? Or was it due to something else like potentially unusual types being passed in for header
(like maybe a NumPy array)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I guess this is explained here ( 44c1d5c ).
This PR brings in much of the work done by @TomAugspurger with ucx/ucx-py with @mrocklin's help I added a small change to cudf protocol handling and general cleanup.