Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed cluster_demo.py on EC2 #234

Open
brucewayne1248 opened this issue May 31, 2018 · 0 comments
Open

Failed cluster_demo.py on EC2 #234

brucewayne1248 opened this issue May 31, 2018 · 0 comments

Comments

@brucewayne1248
Copy link

I tried to run the cluster_demo.py on EC2. The instance starts fine but gets terminated shortly after. I get the following traceback in the stdout.log

sync initiated
log sync initiated
Running in docker
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:126] Couldn't open CUDA library libcuda.so.1. LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: 83526cf8e682
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: Permission denied: could not open driver version path for reading: /proc/driver/nvidia/version
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1065] LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1066] failed to find libcuda.so on this system: Failed precondition: could not dlopen DSO: libcuda.so.1; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
�[32musing seed 1�[0m
2018-05-31 09:18:27.844271 UTC | Setting seed to 1
�[32musing seed 1�[0m
/opt/conda/envs/rllab3/lib/python3.5/site-packages/theano/tensor/signal/downsample.py:6: UserWarning: downsample module has been moved to the theano.tensor.signal.pool module.
"downsample module has been moved to the theano.tensor.signal.pool module.")
Traceback (most recent call last):
File "/root/code/rllab/scripts/run_experiment_lite.py", line 137, in <module>
run_experiment(sys.argv)
File "/root/code/rllab/scripts/run_experiment_lite.py", line 120, in run_experiment
method_call = cloudpickle.loads(base64.b64decode(args.args_data))
File "/opt/conda/envs/rllab3/lib/python3.5/site-packages/cloudpickle/cloudpickle.py", line 800, in _make_skel_func
closure = _reconstruct_closure(closures) if closures else None
File "/opt/conda/envs/rllab3/lib/python3.5/site-packages/cloudpickle/cloudpickle.py", line 792, in _reconstruct_closure
return tuple([_make_cell(v) for v in values])
TypeError: 'int' object is not iterable

Any help? If additional information is necessary, I am ready to provide it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant