Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EGL Backend: Fail to run with CUDA OpenGL interop #209

Open
duongnb09 opened this issue Sep 29, 2022 · 6 comments
Open

EGL Backend: Fail to run with CUDA OpenGL interop #209

duongnb09 opened this issue Sep 29, 2022 · 6 comments

Comments

@duongnb09
Copy link

Tried to run VirtualGL using EGL backend with the official CUDA OpenGL interop sample code, but it keeps failing the following error messages

CUDA error at main.cpp:175 code=999(cudaErrorUnknown) "cudaGraphicsGLRegisterBuffer(pbo_resource, *pbo, cudaGraphicsMapFlagsNone)"
X Error of failed request: 0
Major opcode of failed request: 150 (GLX)
Minor opcode of failed request: 4 (X_GLXDestroyContext)
Serial number of failed request: 93
Current serial number in output stream: 93
image

@dcommander
Copy link
Member

I can reproduce the error, but unfortunately I cannot readily figure out why it is occurring, due to the fact that nVidia's libraries are closed-source. This may be a similar issue to the issues with their Vulkan drivers (https://forums.developer.nvidia.com/t/headless-vulkan-with-multiple-gpus/222832/15), in that the CUDA driver may make some assumptions regarding the X display that are not valid in a remote display environment.

@dcommander
Copy link
Member

Note that OpenCL/OpenGL interop also doesn't work with the EGL back end, because nVidia doesn't support the CL_EGL_DISPLAY_KHR property in their implementation of clCreateContext(). It seems that, despite supporting device-based EGL with OpenGL, some of nVidia's other APIs are tied to X11 in some way.

@dcommander
Copy link
Member

This is still an issue, unfortunately, and it doesn't appear as if it's something that can be fixed in VirtualGL. I can only guess that CUDA is somehow complaining about the fact that VirtualGL is sneaking in an EGL context behind the scenes when CUDA expects a GLX context. (Perhaps CUDA OpenGL interop is tied to the NV-GLX extension in some way?) Anyhow, in order to fully diagnose it, I will need to create a minimally reproducible test case (i.e. to demonstrate how to reproduce the issue without VGL) and forward the issue to nVidia. Since CUDA is a propriatary API, I don't have any particular desire to do any of that work for free, so I have tagged this issue as "funding needed."

@mp3guy
Copy link

mp3guy commented Aug 11, 2024

I'm also hitting this issue where cudaGraphicsGLRegisterImage fails with error 304, cudaErrorOperatingSystem. However, I can confirm that this works fine without VirtualGL in situations where you use an EGL context with no GLX, both "headless" and windowed. So it seems unlikely it's specifically EGL related.

I guess one work around is where the application opens its own EGL context separate to the VirtualGL one, as this works for interop. And then that is shared with the VirtualGL one for presentation/rendering?

I can confirm this issue persists when the host application creates a GL context through either EGL or GLX; only on GLX the error is cudaErrorUnknown.

@dcommander
Copy link
Member

Here's what I observe on my Rocky Linux 8.5 machine with CUDA Toolkit 12.6, nVidia 550.90.07, and a Quadro P620:

  • simpleCUDA2GL with USE_TEXSUBIMAGE2D defined (which causes the program to call cudaGraphicsGLRegisterBuffer()) works fine with the GLX back end, but cudaGraphicsGLRegisterBuffer() fails with cudaErrorUnknown when using the EGL back end.
  • simpleCUDA2GL with USE_TEXSUBIMAGE2D undefined (which causes the program to call cudaGraphicsGLRegisterImage()) works fine with the GLX back end, but cudaGraphicsGLRegisterImage() fails with cudaErrorUnknown when using the EGL back end.

Unfortunately, since CUDA is closed-source, I am completely clueless as to how to diagnose the issue. I used APITrace to obtain a trace of simpleCUDA2GL, but it doesn't show any OpenGL calls being made from within CUDA. It shows only the calls being made by the simpleCUDA2GL program itself.

@dcommander
Copy link
Member

NOTE: Since the resources being passed to CUDA are OpenGL resources, not GLX or EGL resources, it shouldn't really matter how the context was created (but apparently it does, which is the fundamental mystery behind this issue.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants