EGL Backend: Fail to run with CUDA OpenGL interop #209

duongnb09 · 2022-09-29T07:32:49Z

Tried to run VirtualGL using EGL backend with the official CUDA OpenGL interop sample code, but it keeps failing the following error messages

CUDA error at main.cpp:175 code=999(cudaErrorUnknown) "cudaGraphicsGLRegisterBuffer(pbo_resource, *pbo, cudaGraphicsMapFlagsNone)"
X Error of failed request: 0
Major opcode of failed request: 150 (GLX)
Minor opcode of failed request: 4 (X_GLXDestroyContext)
Serial number of failed request: 93
Current serial number in output stream: 93

dcommander · 2022-09-30T21:48:50Z

I can reproduce the error, but unfortunately I cannot readily figure out why it is occurring, due to the fact that nVidia's libraries are closed-source. This may be a similar issue to the issues with their Vulkan drivers (https://forums.developer.nvidia.com/t/headless-vulkan-with-multiple-gpus/222832/15), in that the CUDA driver may make some assumptions regarding the X display that are not valid in a remote display environment.

dcommander · 2022-09-30T21:59:11Z

Note that OpenCL/OpenGL interop also doesn't work with the EGL back end, because nVidia doesn't support the CL_EGL_DISPLAY_KHR property in their implementation of clCreateContext(). It seems that, despite supporting device-based EGL with OpenGL, some of nVidia's other APIs are tied to X11 in some way.

dcommander · 2024-03-01T19:06:36Z

This is still an issue, unfortunately, and it doesn't appear as if it's something that can be fixed in VirtualGL. I can only guess that CUDA is somehow complaining about the fact that VirtualGL is sneaking in an EGL context behind the scenes when CUDA expects a GLX context. (Perhaps CUDA OpenGL interop is tied to the NV-GLX extension in some way?) Anyhow, in order to fully diagnose it, I will need to create a minimally reproducible test case (i.e. to demonstrate how to reproduce the issue without VGL) and forward the issue to nVidia. Since CUDA is a propriatary API, I don't have any particular desire to do any of that work for free, so I have tagged this issue as "funding needed."

mp3guy · 2024-08-11T09:08:18Z

I'm also hitting this issue where cudaGraphicsGLRegisterImage fails with error 304, cudaErrorOperatingSystem. However, I can confirm that this works fine without VirtualGL in situations where you use an EGL context with no GLX, both "headless" and windowed. So it seems unlikely it's specifically EGL related.

I guess one work around is where the application opens its own EGL context separate to the VirtualGL one, as this works for interop. And then that is shared with the VirtualGL one for presentation/rendering?

I can confirm this issue persists when the host application creates a GL context through either EGL or GLX; only on GLX the error is cudaErrorUnknown.

dcommander · 2024-08-12T15:06:08Z

Here's what I observe on my Rocky Linux 8.5 machine with CUDA Toolkit 12.6, nVidia 550.90.07, and a Quadro P620:

simpleCUDA2GL with USE_TEXSUBIMAGE2D defined (which causes the program to call cudaGraphicsGLRegisterBuffer()) works fine with the GLX back end, but cudaGraphicsGLRegisterBuffer() fails with cudaErrorUnknown when using the EGL back end.
simpleCUDA2GL with USE_TEXSUBIMAGE2D undefined (which causes the program to call cudaGraphicsGLRegisterImage()) works fine with the GLX back end, but cudaGraphicsGLRegisterImage() fails with cudaErrorUnknown when using the EGL back end.

Unfortunately, since CUDA is closed-source, I am completely clueless as to how to diagnose the issue. I used APITrace to obtain a trace of simpleCUDA2GL, but it doesn't show any OpenGL calls being made from within CUDA. It shows only the calls being made by the simpleCUDA2GL program itself.

dcommander · 2024-08-12T15:12:00Z

NOTE: Since the resources being passed to CUDA are OpenGL resources, not GLX or EGL resources, it shouldn't really matter how the context was created (but apparently it does, which is the fundamental mystery behind this issue.)

dcommander added bug help wanted labels Oct 20, 2022

dcommander added the funding needed label Mar 1, 2024

dcommander mentioned this issue Aug 22, 2024

vgl 3.0.90-20221122 crashing torch on ubuntu 22.04 #227

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EGL Backend: Fail to run with CUDA OpenGL interop #209

EGL Backend: Fail to run with CUDA OpenGL interop #209

duongnb09 commented Sep 29, 2022

dcommander commented Sep 30, 2022

dcommander commented Sep 30, 2022

dcommander commented Mar 1, 2024

mp3guy commented Aug 11, 2024 •

edited

Loading

dcommander commented Aug 12, 2024

dcommander commented Aug 12, 2024

EGL Backend: Fail to run with CUDA OpenGL interop #209

EGL Backend: Fail to run with CUDA OpenGL interop #209

Comments

duongnb09 commented Sep 29, 2022

dcommander commented Sep 30, 2022

dcommander commented Sep 30, 2022

dcommander commented Mar 1, 2024

mp3guy commented Aug 11, 2024 • edited Loading

dcommander commented Aug 12, 2024

dcommander commented Aug 12, 2024

mp3guy commented Aug 11, 2024 •

edited

Loading