Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stuck on deployment #61

Closed
Ramaddan opened this issue Feb 4, 2024 · 8 comments
Closed

Stuck on deployment #61

Ramaddan opened this issue Feb 4, 2024 · 8 comments

Comments

@Ramaddan
Copy link

Ramaddan commented Feb 4, 2024

Hi,

I was trying to deploy Wolf, and all seems to be fine, until I reach the docker cli or compose instructions.

I get the following error and cannot continue:

docker: Error response from daemon: error gathering device information while adding custom device "/dev/nvidia-caps/nvidia-cap1": no such file or directory.
ERRO[0000] error waiting for container: context canceled

I have an RTX nVidia Quadro GPU with driver version: 545.29.06

And the devices nvidia-caps do not exist on my system.

Running on Ubuntu 22.04
Kernel: Linux 5.15.0-92-lowlatency x86_64

Thanks

@Ramaddan
Copy link
Author

Ramaddan commented Feb 5, 2024

I wonder if it has anything to do with this:

https://github.com/NVIDIA/nvidia-docker?tab=readme-ov-file

DEPRECATION NOTICE

This project has been superseded by the NVIDIA Container Toolkit.

Update (Feb 5, 2024): Tried to install the above but it did not fix my specific problem. Still stuck at the nvidia-caps

What is that device anyway? Is it for capture?

@ABeltramo
Copy link
Member

Since you now have the nvidia driver toolkit; could you try running

sudo nvidia-container-cli --load-kmods info

and see if the devices "magically appear"?

You can also try running Wolf without those extra devices but I fear that it'll probably fail. Btw, which distro are you using?

Also, beware of #60 with that driver version. I'm sorry for all the troubles, but Nvidia is just plainly hostile on Linux..

@Ramaddan
Copy link
Author

Ramaddan commented Feb 6, 2024

Hi. Thanks for the help. No problem, used to nVidia.

Here is what I get

NVRM version: 545.29.06
CUDA version: 12.3

Device Index: 0
Device Minor: 0
Model: Quadro RTX 5000
Brand: QuadroRTX
GPU UUID: GPU-(some number)
Bus Location: 00000000:01:00.0
Architecture: 7.5

@Ramaddan
Copy link
Author

Ramaddan commented Feb 6, 2024

Not sure if that was it, but I have the devices now :-)

Both show now:
nvidia-cap1 nvidia-cap2

But I think they really showed after a restart.

There was also something about nvidia container toolkit version prior to 1.12 having issues:
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installing-with-yum-or-dnf

Thanks

Will try to continue now

@Ramaddan
Copy link
Author

Ramaddan commented Feb 6, 2024

Getting stuck here now:

22:03:16.577852995 INFO | Gstreamer version: 1.22.0-0
22:03:16.578928224 INFO | Reading config file from: /wolf/cfg/config.toml
22:03:16.801477822 INFO | Selected H264 encoder: nvcodec
22:03:16.801576288 INFO | Selected HEVC encoder: nvcodec
22:03:16.803287280 INFO | RTSP server started on port: 48010
22:03:16.803337562 INFO | Control server started on port: 47999
22:03:16.803286092 INFO | HTTP server listening on port: 47989
22:03:16.803451998 WARN | [PULSE] Unable to connect, Access denied
22:03:16.803492654 INFO | Starting PulseAudio docker container
22:03:16.804202617 WARN | [DOCKER] Container WolfPulseAudio already present, removing first
22:03:16.805050646 INFO | HTTPS server listening on port: 47984
22:03:18.250209710 WARN | [PULSE] Unable to connect, Access denied

This was my docket CLI command:

docker run --name wolf --network=host -e XDG_RUNTIME_DIR=/tmp/sockets -v /tmp/sockets:/tmp/sockets:rw -e NVIDIA_DRIVER_VOLUME_NAME=nvidia-driver-vol -v nvidia-driver-vol:/usr/nvidia:rw -e HOST_APPS_STATE_FOLDER=/etc/wolf -v /etc/wolf/wolf:/wolf/cfg -v /var/run/docker.sock:/var/run/docker.sock:rw --device-cgroup-rule "c 13:* rmw" --device /dev/nvidia-uvm --device /dev/nvidia-uvm-tools --device /dev/dri/ --device /dev/nvidia-caps/nvidia-cap1 --device /dev/nvidia-caps/nvidia-cap2 --device /dev/nvidiactl --device /dev/nvidia0 --device /dev/nvidia-modeset --device /dev/uinput -v /dev/shm:/dev/shm:rw -v /dev/input:/dev/input:rw -v /run/udev:/run/udev:rw ghcr.io/games-on-whales/wolf:stable

I did not change anything from what was given on the Wolf website

@ABeltramo
Copy link
Member

It's not stuck, just waiting for a Moonlight client to connect!

As for the Pulse warning, it's fixed in the upcoming release; as a quick workaround, you can manually clean the /tmp/sockets folder before launching Wolf.

@Ramaddan
Copy link
Author

Ramaddan commented Feb 7, 2024

Ok, Still stuck.

Moonlight freezes

First, I could not run as daemon, so that I can see the pin link to use. Is there a way to see it without running it in terminal?

Second, moonlight worked and gave me options of running some apps, then it froze with retroarch

Here are the terminal messages:

[2024-02-07 21:10:32] [ /etc/cont-init.d/10-setup_user.sh: executing... ]
[2024-02-07 21:10:32] **** Configure default user ****
[2024-02-07 21:10:32] Container running as root. Nothing to do.
[2024-02-07 21:10:32] DONE
[2024-02-07 21:10:32]
[2024-02-07 21:10:32] [ /etc/cont-init.d/15-setup_devices.sh: executing... ]
[2024-02-07 21:10:32] **** Configure devices ****
[2024-02-07 21:10:32] Exec device groups
[2024-02-07 21:10:33] Adding user 'root' to groups: gow-gid-107,root
[2024-02-07 21:10:33] DONE
[2024-02-07 21:10:33]
[2024-02-07 21:10:33] [ /etc/cont-init.d/30-nvidia.sh: executing... ]
[2024-02-07 21:10:33] Nvidia driver detected
[2024-02-07 21:10:33] [nvidia] Add Vulkan ICD
[2024-02-07 21:10:33] [nvidia] Add EGL external platform
[2024-02-07 21:10:33] [nvidia] Add egl-vendor
[2024-02-07 21:10:33] [nvidia] Add gbm backend
[2024-02-07 21:10:33]
[2024-02-07 21:10:33]
[2024-02-07 21:10:33] [ /etc/cont-init.d/init-gamescope.sh: executing... ]
[2024-02-07 21:10:33] **** Setting up Gamescope ****
[2024-02-07 21:10:33] Launching the container's startup script as user 'root'
0:00:00.030193650 221 0x55b836e3d4e0 WARN vadisplay gstvadisplay.c:316:gst_va_display_initialize: vaInitialize: unknown libva error
libva info: VA-API version 1.17.0
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_17
libva info: va_openDriver() returns 0
0:00:00.321987870 221 0x55b836e3d4e0 WARN default gstvaapi.c:231:plugin_init: Cannot create a VA display
0:00:00.329842668 221 0x55b836e3d4e0 WARN default ges-meta-container.c:236:_set_value:GESAsset@0x55b837d04470 Could not set value on item: format-version
0:00:00.329868543 221 0x55b836e3d4e0 WARN default ges-meta-container.c:236:_set_value:GESAsset@0x55b837d04d30 Could not set value on item: format-version
0:00:00.329889973 221 0x55b836e3d4e0 WARN default ges-meta-container.c:236:_set_value:GESAsset@0x55b837d05540 Could not set value on item: format-version
0:00:00.330187729 221 0x55b836e3d4e0 WARN structure gststructure.c:2334:priv_gst_structure_parse_fields: Failed to find delimiter, r=mimetype
0:00:00.339279852 221 0x55b836e3d4e0 WARN vadisplay gstvadisplay.c:316:gst_va_display_initialize: vaInitialize: unknown libva error
0:00:00.364229073 221 0x55b836e3d4e0 WARN GST_PLUGIN_LOADING gstplugin.c:534:gst_plugin_register_func: plugin "/usr/local/lib/x86_64-linux-gnu/gstreamer-1.0/validate/libgstvalidatessim.so" failed to initialise
0:00:00.366679137 221 0x55b836e3d4e0 WARN adaptivedemux2 gstadaptivedemuxelement.c:41:adaptivedemux2_base_element_init: Failed to load libsoup library
0:00:00.366825376 221 0x55b836e3d4e0 WARN adaptivedemux2 gstadaptivedemuxelement.c:41:adaptivedemux2_base_element_init: Failed to load libsoup library
0:00:00.366961813 221 0x55b836e3d4e0 WARN adaptivedemux2 gstadaptivedemuxelement.c:41:adaptivedemux2_base_element_init: Failed to load libsoup library
0:00:00.395852104 221 0x55b836e3d4e0 WARN GST_PLUGIN_LOADING gstplugin.c:534:gst_plugin_register_func: plugin "/usr/local/lib/x86_64-linux-gnu/gstreamer-1.0/validate/libgstvalidatessim.so" failed to initialise
21:10:33.750481303 INFO | Gstreamer version: 1.22.0-0
21:10:33.751558448 INFO | Reading config file from: /wolf/cfg/config.toml
21:10:33.995697704 INFO | Selected H264 encoder: nvcodec
21:10:33.995813322 INFO | Selected HEVC encoder: nvcodec
21:10:33.997481417 INFO | RTSP server started on port: 48010
21:10:33.997484979 INFO | HTTP server listening on port: 47989
21:10:33.997507926 INFO | Control server started on port: 47999
21:10:33.997713651 WARN | [PULSE] Unable to connect, Access denied
21:10:33.997752878 INFO | Starting PulseAudio docker container
21:10:33.999269304 INFO | HTTPS server listening on port: 47984
21:11:03.007751489 INFO | RTP server started on port: 48100
21:11:03.007884712 INFO | RTP server started on port: 48200
0:00:29.777981244 1 0x7ff954001060 WARN cudaconvertscale gstcudaconvertscale.c:1265:gst_cuda_base_convert_set_info: Can't calculate borders
21:11:03.317384145 INFO | Starting container: /WolfRetroarch_17305313217028394902
21:11:03.932710183 WARN | [INPUT] Unable to find controller 4
21:11:03.932810773 WARN | [INPUT] Unable to find controller 5
21:11:03.932950399 WARN | [INPUT] Unable to find controller 6
21:11:03.933056161 WARN | [INPUT] Unable to find controller 7
0:00:31.650606677 1 0x7ff9540015e0 WARN audioencoder gstaudioencoder.c:1014:gst_audio_encoder_finish_frame: Can't copy metadata because input buffer disappeared
0:01:13.030246307 1 0x7ff924002680 WARN audiosrc gstaudiosrc.c:227:audioringbuffer_thread_func: error reading data -1 (reason: Success), skipping segment
21:11:56.530056785 INFO | Stopped container: /WolfRetroarch_17305313217028394902
21:13:19.472040181 INFO | RTP server started on port: 48100
21:13:19.472177838 INFO | RTP server started on port: 48200
0:02:46.215739598 1 0x7ff9540015e0 WARN cudaconvertscale gstcudaconvertscale.c:1265:gst_cuda_base_convert_set_info: Can't calculate borders
21:13:19.737998689 INFO | Starting container: /WolfRetroarch_17305313217028394902
21:13:20.399408939 WARN | [INPUT] Unable to find controller 4
21:13:20.399510081 WARN | [INPUT] Unable to find controller 5
21:13:20.399664170 WARN | [INPUT] Unable to find controller 6
21:13:20.399784263 WARN | [INPUT] Unable to find controller 7
0:02:48.106770298 1 0x7ff954000da0 WARN audioencoder gstaudioencoder.c:1014:gst_audio_encoder_finish_frame: Can't copy metadata because input buffer disappeared
0:02:56.929274749 1 0x7ff934003d70 WARN audiosrc gstaudiosrc.c:227:audioringbuffer_thread_func: error reading data -1 (reason: Success), skipping segment

@Ramaddan
Copy link
Author

Ramaddan commented Feb 8, 2024

  1. I tried to stream to an android phone instead.

  2. It somewhat works, at least firefox, but cannot seem to bring up any keyboard layout.

And retroarch opens, but the graphics are too bad to be able to use

So it seems the problem is still there

  1. The other thing I noticed is that it controls the mouse cursor on the host machine, is that normal?

Shouldn't it be isolated from being able to do anything but stream from the host within the container, multiple instances?

  1. I also noticed I have to keep executing the command after every reboot or shutdown

sudo nvidia-container-cli --load-kmods info

so that it finds the /dev/nvidia-cap1 and /dev/nvidia-cap2 devices again

Is there a way to make this more permanent?

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants