Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot create Dask Client to a dask-scheduler with UCX protocol #459

Closed
randerzander opened this issue Mar 18, 2020 · 6 comments
Closed

Comments

@randerzander
Copy link

randerzander commented Mar 18, 2020

With latest nightlies:

(rapids) rgelhausen@rl-dgx-d19-u08-rapids-dgx102:~rapids-queries/q18$ conda list | grep ucx
ucx                       1.7.0+g9d06c3a       cuda10.1_0    rapidsai-nightly
ucx-proc                  1.0.0                       gpu    rapidsai-nightly
ucx-py                    0.13.0a200318+g9d06c3a         py37_76    rapidsai-nightly

I'm no longer able to create a Dask Client object to a scheduler running with UCX protocol:

# url - ucx://10.150.162.155
Traceback (most recent call last):
  File "/home/rgelhausen/tests/test_cluster.py", line 10, in <module>
    client = Client(url+':8786')
  File "/home/rgelhausen/conda/envs/rapids/lib/python3.7/site-packages/distributed/client.py", line 721, in __init__
    self.start(timeout=timeout)
  File "/home/rgelhausen/conda/envs/rapids/lib/python3.7/site-packages/distributed/client.py", line 894, in start
    sync(self.loop, self._start, **kwargs)
  File "/home/rgelhausen/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 348, in sync
    raise exc.with_traceback(tb)
  File "/home/rgelhausen/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 332, in f
    result[0] = yield future
  File "/home/rgelhausen/conda/envs/rapids/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
    value = future.result()
  File "/home/rgelhausen/conda/envs/rapids/lib/python3.7/site-packages/distributed/client.py", line 989, in _start
    await self._ensure_connected(timeout=timeout)
  File "/home/rgelhausen/conda/envs/rapids/lib/python3.7/site-packages/distributed/client.py", line 1073, in _ensure_connected
    assert msg[0]["op"] == "stream-start"
AssertionError
distributed.scheduler - INFO - Receive client connection: Client-fb038d70-6922-11ea-a737-d8c497cf5f77
distributed.scheduler - INFO - Close client connection: Client-fb038d70-6922-11ea-a737-d8c497cf5f77
distributed.core - ERROR - tuple indices must be integers or slices, not str
Traceback (most recent call last):
  File "/home/rgelhausen/conda/envs/rapids/lib/python3.7/site-packages/distributed/core.py", line 412, in handle_comm
    result = await result
  File "/home/rgelhausen/conda/envs/rapids/lib/python3.7/site-packages/distributed/scheduler.py", line 2502, in add_client
    versions,
  File "/home/rgelhausen/conda/envs/rapids/lib/python3.7/site-packages/distributed/versions.py", line 125, in error_message
    node_packages[node]["python"] = info["host"]["python"]
TypeError: tuple indices must be integers or slices, not str

The scheduler was started as:

DASK_UCX__RMM_POOL_SIZE=1GB DASK_UCX__ENABLE_INFINIBAND="False" DASK_UCX__ENABLE_NVLINK="True" dask-scheduler --protocol ucx
@quasiben
Copy link
Member

I tried reproducing with a clean env today and was unable. I believe @randerzander is assessing his conda env

@mrocklin
Copy link
Collaborator

mrocklin commented Mar 18, 2020 via email

@randerzander
Copy link
Author

Resolved after rebuilding environments. Thanks!

@mrocklin
Copy link
Collaborator

mrocklin commented Mar 18, 2020 via email

@jakirkham
Copy link
Member

What change are you referring to Matt? 🙂

@jrbourbeau
Copy link

We added the Python version to the list of version checks in dask/distributed#3567. I suspect TypeError: tuple indices must be integers or slices, not str is coming from changing the output of get_system_info to be a dictionary instead of a tuple in that PR. Glad to hear rebuilding envs fixed things. Do let me know if this pops up again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants