-
-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ho w do I enable all the imports with in my service available and work with dask #3566
Comments
@Sargababu could you provide a minimal example (see https://blog.dask.org/2018/02/28/minimal-bug-reports)? It's not clear to me what code you're running that's resulting in the traceback you posted. Looks like you're using |
@jrbourbeau Thanks for taking time to look into this issue. So basically I have my dask scheduler With in my main.py I have these many imports(I am just adding the relevant parts from my script) from src.utils.exceptions import * client = Client(dask_scheduler_ip) The error is shooting because dask workers are not able to fetch my imports like src.utils.exceptions, src.utils.constants. I found suggestions online to use client.upload_file to specify the files explicitly . But I am not sure how to use it in my case or is there any other way to make all my file imports available to my workers. |
In general Dask assumes that the software environment is the same between your client and your workers. Dask itself can move around individual files with upload_file, but for more complex environments you will have to manage the software environment yourself. Typically on the cloud people do this with Docker images and something like Kubernetes. |
@mrocklin Thanks for the clarification. I had one more question around this. So when we ask a dask worker to do some job that requires some package say requests or opencv then will it look into client library location or worker location? Probably a dumb question. File "/opt/conda/lib/python3.7/site-packages/distributed/utils.py", line 329, in f Any idea why this could be happening? |
A Dask Worker is a Python process. If you ask it to use some library then that Python process will try to import it however Python imports things. The worker is unable to look at the client's process and see its software environment. Dask doesn't do any magic here. It is just a bunch of Python processes. Regarding "unknown opcode" unfortunately no, I'm unfamiliar with that error message. Given your concern above about mismatched software environments my first recommendation would be to ensure that all of your Python processes have the same versions. You might try the following command: client.get_versions(check=True) |
@mrocklin yes I did try that and have made all versions same across client and workers. can this be caused of mismatch in python versions. |
Yes, you certainly want Python versions to also be the same.
…On Tue, Mar 17, 2020 at 10:59 AM Sarga ***@***.***> wrote:
@mrocklin <https://github.com/mrocklin> yes I did try that and have made
all versions same across client and workers. can this be caused of mismatch
in python versions.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3566 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTGLVRRRPAFOXXFVOUDRH627FANCNFSM4LF7EYPQ>
.
|
so if there is python version mismatch will this client.get_versions(check=True) detects that? |
Not currently, but #3567 adds Python to the list of packages that are checked with |
@jrbourbeau ok thanks for confirming that. I guess it could be my python version that causing this issue. Let me just get that confirmed . @mrocklin @jrbourbeau really appreciate taking effort and time to help me here :) |
@mrocklin @jrbourbeau Thanks alot again changing python version to same across client and workers solved my issue. |
Great! Glad to hear your issue has been resolved |
I am having a lot of imports with in my service and when I try to make it work as expected by calling dask scheduler in my cluster I am getting ModuleNotFoundError as below.
raise result File "/opt/conda/lib/python3.6/site-packages/distributed/worker.py", line 972, in upload_file File "/opt/conda/lib/python3.6/site-packages/distributed/utils.py", line 1055, in import_file File "/opt/conda/lib/python3.6/importlib/__init__.py", line 126, in import_module File "/worker-i7k5722z/main.py", line 12, in <module> ModuleNotFoundError: No module named 'src' [2020-03-11 14:24:06 -0700] [98249] [INFO] Worker exiting (pid: 98249) [2020-03-11 14:24:06 -0700] [98250] [ERROR] Exception in worker process
My directory structure is as follows:
- resources:
input_schema.json
- src:
client:
datalake_download.py
- utils:
dice_calculator.py
main.py
With in main.py I call my client and gather the results. I importing all the files with in main.py. What is the better approach in my case to make all my imports available across my dask workers.
The text was updated successfully, but these errors were encountered: