ash: Repeatedly call enumeration functions when `VK_INCOMPLETE` is returned #465

MarijnS95 · 2021-08-07T08:57:31Z

Fixes #464

ash/src/device.rs

MarijnS95 · 2021-08-07T09:03:46Z

I think this PR can go in with a patch release as no API is modified. However, IMO we should perform a minor (breaking) release afterwards to get rid of explicit *_len() functions and make them use the same helper instead:

get_image_sparse_memory_requirements2_len()
get_physical_device_queue_family_properties2_len()
get_physical_device_sparse_image_format_properties2_len()
enumerate_physical_device_groups_len()

The problem here is that their non-_len counterpart receives a mutable slice but never updates the resulting size (impossible with slice references anyway, unless having a mutable reference to the mutable slice reference), in case count changed between the _len and "fill" call (not sure if that's possible for these functions though, they can at least return VK_INCOMPLETE).

However, maybe some (embedded) applications perform "optimizations" and keep this in static arrays for which they want to have the &mut [T] function available?

MarijnS95 · 2021-08-07T09:07:01Z

ash/src/prelude.rs

+/// the result-vec is not large enough.
+///
+/// [`vkEnumerateInstanceExtensionProperties]: https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/vkEnumerateInstanceExtensionProperties.html
+pub(crate) unsafe fn read_into_vector<T>(


Naming suggestions are welcome, btw 😉

ash/src/extensions/khr/pipeline_executable_properties.rs

ash/src/prelude.rs

Ralith

However, IMO we should perform a minor (breaking) release afterwards to get rid of explicit *_len() functions and make them use the same helper instead

These variants were introduced for the case where an extension output structure may be needed, in which case each element's pNext must be initialized to address a unique output structure. It'd be nice to develop a high-level, abstract interface for that case, but the helper introduced here isn't sufficient; it'll have to be significantly more complex (maybe a HList of vecs?).

ash/src/prelude.rs

ash/src/device.rs

ash/src/extensions/khr/pipeline_executable_properties.rs

ash/src/prelude.rs

MarijnS95 · 2021-08-16T22:09:55Z

However, IMO we should perform a minor (breaking) release afterwards to get rid of explicit *_len() functions and make them use the same helper instead

These variants were introduced for the case where an extension output structure may be needed, in which case each element's pNext must be initialized to address a unique output structure.

Conveniently enough only enumerate_physical_device_groups_len out of the four listed functions actually has a return type, there's no possibility for the others to return INCOMPLETE.

We can think about helpers for initializing pNext later; I think you'll want to apply that to the functions in pipeline_executable_properties.rs later as well, which need the same initialization but currently do not allow pNext to be filled beforehand?

Ralith · 2021-08-16T22:28:15Z

We can think about helpers for initializing pNext later; I think you'll want to apply that to the functions in pipeline_executable_properties.rs later as well, which need the same initialization but currently do not allow pNext to be filled beforehand?

Yeah, I think there's a whole bunch of cases where we haven't exposed pNext for output just because nobody's wanted it yet. Agreed that we should leave it to followup work, let's just not take away the existing workarounds until then.

ash/src/device.rs

kvark

Is this really needed for all these functions? I'd think that the scenario when VK_INCOMPLETE shows up is very specific, since something must be able to change behind ones back. It doesn't mean that all of the vectored queries have to do it.

MarijnS95 · 2021-08-23T10:04:37Z

@kvark The documentation for every function that now uses read_into_*_vector mentions that it can possibly return VK_INCOMPLETE, even though most are likely only because of not providing enough space in the first place instead of being a change in the background.

Either way I don't want to be inconsistent in our API and only handle VK_INCOMPLETE in specific places, especially if Ash is the one querying for and allocating the vector. Unless the Vulkan docs are very clear on where a change behind ones back can possibly happen, I think it's better to be safe than sorry. An extra compare+jump shouldn't hurt that much either, in favour of having simplified code too.

In any case, if we really want to make a distinction between those kinds of functions (again, if the Vulkan API clearly specifies this), the non-looping queries would get an assertion/panic instead as they're not allowed to change the count after initially querying it.

kvark · 2021-08-23T16:34:31Z

Specification says VK_INCOMPLETE is possible for a simple reason that your count is too low. I don't think we should be careful "just in case" here. Ash is expected to understand the Vulkan spec in its implementation.

Ralith · 2021-08-23T16:46:23Z

Specification says VK_INCOMPLETE is possible for a simple reason that your count is too low

Yes, that's the definition of the error. What it doesn't say is that the required capacity is guaranteed never to change. We should not rely on assumptions that implementations are under no obligation to comply with, particularly when there's clear precedent for getters that may change at any time.

filnet · 2021-08-23T16:49:50Z

The exact phrasing in the specs is:

VK_INCOMPLETE A return array was too small for the result

But it does not say when or how this array can be too small.

MarijnS95 · 2021-08-23T17:48:25Z

Specification says VK_INCOMPLETE is possible for a simple reason that your count is too low.

This is extremely unlikely (dare I say impossible bar cosmic radiation) when ash just queried the count in the previous call. You yourself suggested to run this in a loop, and since vkEnumerateDeviceExtensionProperties does not explicitly state whether or not a change at runtime is possible, we assume this is the case for every function unless proven otherwise.

If you want to get to the bottom of this I recommend checking if anything on the system of the original reporter is out of the ordinary, and otherwise insert some debug logging to see if the count indeed changes (and if possible, run this in some tight loop that could perhaps diff the list before and after). This is also my first time encountering VK_INCOMPLETE.

I'd still like to keep the read_into_*_vector with lambda if we decide against this loop, it cleans up duplicate count/result handling nicely.

kvark · 2021-08-23T20:13:16Z

ok, I don't think my point got through, but I also don't want to block this fix that would clearly help us. So please land this :)

MarijnS95 · 2021-08-24T08:27:08Z

@kvark It's important to get on the same page first as this fix is solving an issue reported by you - I have never seen it before and don't rely on this PR.

If you think there is a specific environment in which a change in count occurs, shouldn't it be up to the application to detect and handle this case? At least I think that's what you're suggesting above? Otherwise, don't we all agree that the spec is rather vague in this aspect, and doesn't confirm nor rule out a change of count between queries?

kvark · 2021-08-24T17:37:24Z

The problem originates from this clause:

Because the list of available layers may change externally between calls to vkEnumerateInstanceExtensionProperties, two calls may retrieve different results if a pLayerName is available in one call but not in another. The extensions supported by a layer may also change between two calls, e.g. if the layer implementation is replaced by a different version between those calls.

This is specifically about vkEnumerateInstanceExtensionProperties. If it applied to other functions, we'd expect to see similar statements there. So the spec doesn't appear to be vague here.

MarijnS95 · 2021-08-24T19:47:49Z

@kvark Now that brings something new to the table. Up to this point it looks like everyone failed to notice that bit in the spec but you.

Are you aware of any other function that has the same or a similar mention in the spec?

I'll update this PR ASAP to deal with this.

kvark · 2021-08-24T19:52:54Z

It's not new, it's the starting point - #464 (comment)
Searching the spec reveals that this is the only function with that kind of obscure behavior. We can cover more later if we discover anything - the changes are backwards compatible anyway.

filnet · 2021-08-24T19:53:08Z

VK_INCOMPLETE can happen :

1/ if the user provides an array that is too small.
But that can't happen with the proposed change.

2/ when using a Vulkan function that explicitly mentions that the list can change between two calls.
Only vkEnumerateInstanceExtensionProperties mentions it but I find it a bit dangerous to assume that no other functions has this behavior. And Vulkan might add more such functions or change the behavior of existing functions.

3/ with bad user code ?
Would it be possible for user code to issue a concurrent function call in between the "two calls" which would cause the list size to change ? I doubt that this is possible since Vulkan is mostly stateless but who knows.

MarijnS95 · 2021-08-24T19:57:51Z

It's not new, it's the starting point - #464 (comment)

New in the sense that I hadn't noticed / forgotten this was mentioned in the issue.

Searching the spec reveals that this is the only function with that kind of obscure behavior. We can cover more later if we discover anything - the changes are backwards compatible anyway.

Great, only one function to deal with then.

VK_INCOMPLETE can happen :

1/ if the user provides an array that is too small.
But that can't happen with the proposed change.

Couldn't happen before either, because Ash was querying the size and directly reusing this value in every instance.

2/ when using a Vulkan function that explicitly mentions that the list can change between two calls.
Only vkEnumerateInstanceExtensionProperties mentions but I find it a bit dangerous to assume that no other functions has this behavior. And Vulkan might add more such functions or change the behavior of existing functions.

We'll do this just for vkEnumerateInstanceExtensionProperties now. No-one has ever seen this before anyway, and it's no surprise that the only occurrence of a changing value happens on a function that has it clearly documented. Nothing dangerous IMO.

3/ with bad user code ?
Would it be possible for user code to issue a concurrent function call in between the "two calls" which would cause the list size to change ? I doubt that this is possible since Vulkan is mostly stateless but who knows.

Strongly doubt this. Maybe available layers can change at any point since that's an external factor?

filnet · 2021-08-24T20:06:17Z

What about functions related to hardware ?
vkGetPhysicalDeviceDisplayPropertiesKHR for example. What if a display device gets connected/disconnected between the two calls.

MarijnS95 · 2021-08-24T20:18:07Z

What about functions related to hardware ?
vkGetPhysicalDeviceDisplayPropertiesKHR for example. What if a display device gets connected/disconnected between the two calls.

Same thoughts here; quick testing shows this doesn't change until restarting the application. Maybe that is the case for similar functions, too.

Ralith · 2021-08-24T20:29:01Z

We'll do this just for vkEnumerateInstanceExtensionProperties now.

We should do this for every function that may return VK_INCOMPLETE unless the spec explicitly guarantees that the size is fixed. "It worked on my machine" is not an excuse for UB. It's not like there's a drawback to being defensive here.

MarijnS95 · 2021-08-24T20:50:58Z

We should do this for every function that may return VK_INCOMPLETE unless the spec explicitly guarantees that the size is fixed. "It worked on my machine" is not an excuse for UB. It's not like there's a drawback to being defensive here.

Rather, there's two camps now: those who want to spend one extra cycle for the compare, and those who don't. I'm all for the defensive route, feel free to merge.

…erties The type and next-pointer of output structures must always be initialized. Fixes: ac4d046 ("Added `VK_EXT_tooling_info` extension support (ash-rs#310)")

kvark · 2021-08-24T21:44:52Z

I don't care about extra cycles for this. I care about Ash knowing what API it abstracts over. So far, Vulkan spec does not explicitly spell out any guarantees about what the count is between different invocations, except for vkEnumerateInstanceExtensionProperties. If it's also the case for other functions (like vkGetPhysicalDeviceDisplayPropertiesKHR), then I'd say this would be a spec bug. But Ash doesn't need to be smarter than the spec.

Also, I'm no longer sure that the loop is the right solution here. What's the point of ensuring that you got exactly all of the elements at some point of the program execution? If I got 3 layers reported, that doesn't mean that the next second the configuration isn't going to change, anyway.

Did we consider just doing the following:

do the regular call for getting the count, and then the call to fill the data
if the returned count is less, issue a log::warn and continue
if the returned error is VK_INCOMPLETE, issue log::warn and continue

This seems marginally better than having a loop. And in a scenario where it happens - your configuration changed between two invocations - there isn't any guarantee that it's not going to change right after the call, anyway.

This is all super bike-sheddy, and I understand if you folks just don't consider it worth improving. Landing the patch as is would be fine by me.

filnet · 2021-08-24T22:08:12Z

There's also vkEnumerateInstanceLayerProperties ;)

ash/src/extensions/ext/tooling_info.rs

Ralith · 2021-08-25T00:05:24Z

So far, Vulkan spec does not explicitly spell out any guarantees about what the count is between different invocations, except for vkEnumerateInstanceExtensionProperties. If it's also the case for other functions (like vkGetPhysicalDeviceDisplayPropertiesKHR), then I'd say this would be a spec bug.

If the spec guarantees a thing in one case and doesn't guarantee that thing in other cases, that's not a bug, it's the absence of a guarantee. When behavior isn't guaranteed, we cannot assume a single possibility.

What's the point of ensuring that you got exactly all of the elements at some point of the program execution? If I got 3 layers reported, that doesn't mean that the next second the configuration isn't going to change, anyway.

Good question. Unfortunately, it is necessary, because there's no guarantee which results are omitted from an incomplete set. A user can reasonably be expected to tolerate having fractionally out of date information regarding a change in configuration/whatever that just happened, but they should not expect to lose information that hasn't ever changed just because another datum in the same category changed.

log::warn and continue

IMO adding log to the dependencies and public interface of the crate is strongly contrary to the spirit of ash; user-facing diagnostics are not the responsibility of a Vulkan binding.

Ralith · 2021-08-25T00:16:15Z

Merging as-is since this is a clear improvement over the status quo, and a code size reduction besides. We can discuss in a new issue if anyone feels further refinement is necessary.

MarijnS95 · 2021-08-25T20:36:47Z

then I'd say this would be a spec bug. But Ash doesn't need to be smarter than the spec.

(In addition to what @Ralith said, which I totally agree with) I say we do, for the sake of being friendly to crate users instead of leaving them open to this under-specified part of the spec (as @Ralith said, there's no explicit guarantee that this won't happen).

What's the point of ensuring that you got exactly all of the elements at some point of the program execution? If I got 3 layers reported, that doesn't mean that the next second the configuration isn't going to change, anyway.

If we get back less, that's fine. If we get back more, that's a VK_INCOMPLETE that usually propagates up through an application as Err()?. We could do a couple things here:

Print some logs. I'm not a fan of this either, it's unexpected from core crates and very easy to miss. There are better solutions:
Explicitly encode this artifact in Ash's public API, like for example fn acquire_next_image(). On VK_INCOMPLETE, we still return the Vec<_> and leave it up to the caller to decide to take the result or call this function again.

And if it's less, I don't think the caller is interested in that at all. Unless they're doing sub-microsecond poking of the Vulkan API to figure out when exactly this count changes (higher or lower) - I'll redirect them to use the raw pointer functions if that's their intended usecase 😬

MarijnS95 · 2021-08-26T19:58:25Z

0.33.2 is now released with this change.

MarijnS95 requested review from Ralith and kvark August 7, 2021 08:57

MarijnS95 commented Aug 7, 2021

View reviewed changes

ash/src/device.rs Outdated Show resolved Hide resolved

ash/src/device.rs Show resolved Hide resolved

MarijnS95 commented Aug 7, 2021

View reviewed changes

Ralith requested changes Aug 7, 2021

View reviewed changes

ash/src/prelude.rs Show resolved Hide resolved

ash/src/device.rs Outdated Show resolved Hide resolved

ash/src/device.rs Show resolved Hide resolved

ash/src/extensions/khr/pipeline_executable_properties.rs Show resolved Hide resolved

ash/src/prelude.rs Outdated Show resolved Hide resolved

MarijnS95 force-pushed the enumerate-incomplete branch 2 times, most recently from 9a82158 to 32f7c48 Compare August 16, 2021 22:10

MarijnS95 requested a review from Ralith August 17, 2021 08:31

Ralith approved these changes Aug 21, 2021

View reviewed changes

ash/src/device.rs Show resolved Hide resolved

kvark reviewed Aug 23, 2021

View reviewed changes

MarijnS95 marked this pull request as draft August 24, 2021 20:52

MarijnS95 added 2 commits August 24, 2021 23:00

ash: Repeatedly call enumeration functions when INCOMPLETE is returned

ce9d6bf

tooling_info: Initialize sType/pNext in get_physical_device_tool_prop…

92611e6

…erties The type and next-pointer of output structures must always be initialized. Fixes: ac4d046 ("Added `VK_EXT_tooling_info` extension support (ash-rs#310)")

MarijnS95 force-pushed the enumerate-incomplete branch from ae3a715 to 92611e6 Compare August 24, 2021 21:01

MarijnS95 marked this pull request as ready for review August 24, 2021 21:01

filnet reviewed Aug 24, 2021

View reviewed changes

ash/src/extensions/ext/tooling_info.rs Show resolved Hide resolved

Ralith merged commit 00abdfc into ash-rs:master Aug 25, 2021

MarijnS95 deleted the enumerate-incomplete branch August 25, 2021 20:07

teoxoy mentioned this pull request Nov 18, 2023

I can only render things after reinstalling NVIDIA drivers and before rebooting my system gfx-rs/wgpu#3881

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ash: Repeatedly call enumeration functions when `VK_INCOMPLETE` is returned #465

ash: Repeatedly call enumeration functions when `VK_INCOMPLETE` is returned #465

MarijnS95 commented Aug 7, 2021

MarijnS95 commented Aug 7, 2021 •

edited

Loading

MarijnS95 Aug 7, 2021

Ralith left a comment

MarijnS95 commented Aug 16, 2021

Ralith commented Aug 16, 2021

kvark left a comment

MarijnS95 commented Aug 23, 2021

kvark commented Aug 23, 2021

Ralith commented Aug 23, 2021

filnet commented Aug 23, 2021

MarijnS95 commented Aug 23, 2021

kvark commented Aug 23, 2021

MarijnS95 commented Aug 24, 2021

kvark commented Aug 24, 2021

MarijnS95 commented Aug 24, 2021

kvark commented Aug 24, 2021

filnet commented Aug 24, 2021 •

edited

Loading

MarijnS95 commented Aug 24, 2021

filnet commented Aug 24, 2021

MarijnS95 commented Aug 24, 2021

Ralith commented Aug 24, 2021 •

edited

Loading

MarijnS95 commented Aug 24, 2021

kvark commented Aug 24, 2021

filnet commented Aug 24, 2021

Ralith commented Aug 25, 2021 •

edited

Loading

Ralith commented Aug 25, 2021

MarijnS95 commented Aug 25, 2021

MarijnS95 commented Aug 26, 2021

ash: Repeatedly call enumeration functions when VK_INCOMPLETE is returned #465

ash: Repeatedly call enumeration functions when VK_INCOMPLETE is returned #465

Conversation

MarijnS95 commented Aug 7, 2021

MarijnS95 commented Aug 7, 2021 • edited Loading

MarijnS95 Aug 7, 2021

Choose a reason for hiding this comment

Ralith left a comment

Choose a reason for hiding this comment

MarijnS95 commented Aug 16, 2021

Ralith commented Aug 16, 2021

kvark left a comment

Choose a reason for hiding this comment

MarijnS95 commented Aug 23, 2021

kvark commented Aug 23, 2021

Ralith commented Aug 23, 2021

filnet commented Aug 23, 2021

MarijnS95 commented Aug 23, 2021

kvark commented Aug 23, 2021

MarijnS95 commented Aug 24, 2021

kvark commented Aug 24, 2021

MarijnS95 commented Aug 24, 2021

kvark commented Aug 24, 2021

filnet commented Aug 24, 2021 • edited Loading

MarijnS95 commented Aug 24, 2021

filnet commented Aug 24, 2021

MarijnS95 commented Aug 24, 2021

Ralith commented Aug 24, 2021 • edited Loading

MarijnS95 commented Aug 24, 2021

kvark commented Aug 24, 2021

filnet commented Aug 24, 2021

Ralith commented Aug 25, 2021 • edited Loading

Ralith commented Aug 25, 2021

MarijnS95 commented Aug 25, 2021

MarijnS95 commented Aug 26, 2021

ash: Repeatedly call enumeration functions when `VK_INCOMPLETE` is returned #465

ash: Repeatedly call enumeration functions when `VK_INCOMPLETE` is returned #465

MarijnS95 commented Aug 7, 2021 •

edited

Loading

filnet commented Aug 24, 2021 •

edited

Loading

Ralith commented Aug 24, 2021 •

edited

Loading

Ralith commented Aug 25, 2021 •

edited

Loading