Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ash: Repeatedly call enumeration functions when VK_INCOMPLETE is returned #465

Merged
merged 2 commits into from
Aug 25, 2021

Conversation

MarijnS95
Copy link
Collaborator

Fixes #464

@MarijnS95 MarijnS95 requested review from Ralith and kvark August 7, 2021 08:57
ash/src/device.rs Outdated Show resolved Hide resolved
ash/src/device.rs Show resolved Hide resolved
@MarijnS95
Copy link
Collaborator Author

MarijnS95 commented Aug 7, 2021

I think this PR can go in with a patch release as no API is modified. However, IMO we should perform a minor (breaking) release afterwards to get rid of explicit *_len() functions and make them use the same helper instead:

get_image_sparse_memory_requirements2_len()
get_physical_device_queue_family_properties2_len()
get_physical_device_sparse_image_format_properties2_len()
enumerate_physical_device_groups_len()

The problem here is that their non-_len counterpart receives a mutable slice but never updates the resulting size (impossible with slice references anyway, unless having a mutable reference to the mutable slice reference), in case count changed between the _len and "fill" call (not sure if that's possible for these functions though, they can at least return VK_INCOMPLETE).

However, maybe some (embedded) applications perform "optimizations" and keep this in static arrays for which they want to have the &mut [T] function available?

/// the result-vec is not large enough.
///
/// [`vkEnumerateInstanceExtensionProperties]: https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/vkEnumerateInstanceExtensionProperties.html
pub(crate) unsafe fn read_into_vector<T>(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming suggestions are welcome, btw 😉

ash/src/prelude.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@Ralith Ralith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, IMO we should perform a minor (breaking) release afterwards to get rid of explicit *_len() functions and make them use the same helper instead

These variants were introduced for the case where an extension output structure may be needed, in which case each element's pNext must be initialized to address a unique output structure. It'd be nice to develop a high-level, abstract interface for that case, but the helper introduced here isn't sufficient; it'll have to be significantly more complex (maybe a HList of vecs?).

ash/src/prelude.rs Show resolved Hide resolved
ash/src/device.rs Outdated Show resolved Hide resolved
ash/src/device.rs Show resolved Hide resolved
ash/src/prelude.rs Outdated Show resolved Hide resolved
@MarijnS95
Copy link
Collaborator Author

However, IMO we should perform a minor (breaking) release afterwards to get rid of explicit *_len() functions and make them use the same helper instead

These variants were introduced for the case where an extension output structure may be needed, in which case each element's pNext must be initialized to address a unique output structure.

Conveniently enough only enumerate_physical_device_groups_len out of the four listed functions actually has a return type, there's no possibility for the others to return INCOMPLETE.

We can think about helpers for initializing pNext later; I think you'll want to apply that to the functions in pipeline_executable_properties.rs later as well, which need the same initialization but currently do not allow pNext to be filled beforehand?

@MarijnS95 MarijnS95 force-pushed the enumerate-incomplete branch 2 times, most recently from 9a82158 to 32f7c48 Compare August 16, 2021 22:10
@Ralith
Copy link
Collaborator

Ralith commented Aug 16, 2021

We can think about helpers for initializing pNext later; I think you'll want to apply that to the functions in pipeline_executable_properties.rs later as well, which need the same initialization but currently do not allow pNext to be filled beforehand?

Yeah, I think there's a whole bunch of cases where we haven't exposed pNext for output just because nobody's wanted it yet. Agreed that we should leave it to followup work, let's just not take away the existing workarounds until then.

@MarijnS95 MarijnS95 requested a review from Ralith August 17, 2021 08:31
ash/src/device.rs Show resolved Hide resolved
Copy link
Collaborator

@kvark kvark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really needed for all these functions? I'd think that the scenario when VK_INCOMPLETE shows up is very specific, since something must be able to change behind ones back. It doesn't mean that all of the vectored queries have to do it.

@MarijnS95
Copy link
Collaborator Author

@kvark The documentation for every function that now uses read_into_*_vector mentions that it can possibly return VK_INCOMPLETE, even though most are likely only because of not providing enough space in the first place instead of being a change in the background.

Either way I don't want to be inconsistent in our API and only handle VK_INCOMPLETE in specific places, especially if Ash is the one querying for and allocating the vector. Unless the Vulkan docs are very clear on where a change behind ones back can possibly happen, I think it's better to be safe than sorry. An extra compare+jump shouldn't hurt that much either, in favour of having simplified code too.

In any case, if we really want to make a distinction between those kinds of functions (again, if the Vulkan API clearly specifies this), the non-looping queries would get an assertion/panic instead as they're not allowed to change the count after initially querying it.

@kvark
Copy link
Collaborator

kvark commented Aug 23, 2021

Specification says VK_INCOMPLETE is possible for a simple reason that your count is too low. I don't think we should be careful "just in case" here. Ash is expected to understand the Vulkan spec in its implementation.

@Ralith
Copy link
Collaborator

Ralith commented Aug 23, 2021

Specification says VK_INCOMPLETE is possible for a simple reason that your count is too low

Yes, that's the definition of the error. What it doesn't say is that the required capacity is guaranteed never to change. We should not rely on assumptions that implementations are under no obligation to comply with, particularly when there's clear precedent for getters that may change at any time.

@filnet
Copy link
Contributor

filnet commented Aug 23, 2021

The exact phrasing in the specs is:

VK_INCOMPLETE A return array was too small for the result

But it does not say when or how this array can be too small.

@MarijnS95
Copy link
Collaborator Author

Specification says VK_INCOMPLETE is possible for a simple reason that your count is too low.

This is extremely unlikely (dare I say impossible bar cosmic radiation) when ash just queried the count in the previous call. You yourself suggested to run this in a loop, and since vkEnumerateDeviceExtensionProperties does not explicitly state whether or not a change at runtime is possible, we assume this is the case for every function unless proven otherwise.

If you want to get to the bottom of this I recommend checking if anything on the system of the original reporter is out of the ordinary, and otherwise insert some debug logging to see if the count indeed changes (and if possible, run this in some tight loop that could perhaps diff the list before and after). This is also my first time encountering VK_INCOMPLETE.

I'd still like to keep the read_into_*_vector with lambda if we decide against this loop, it cleans up duplicate count/result handling nicely.

@kvark
Copy link
Collaborator

kvark commented Aug 23, 2021

ok, I don't think my point got through, but I also don't want to block this fix that would clearly help us. So please land this :)

@MarijnS95
Copy link
Collaborator Author

@kvark It's important to get on the same page first as this fix is solving an issue reported by you - I have never seen it before and don't rely on this PR.

If you think there is a specific environment in which a change in count occurs, shouldn't it be up to the application to detect and handle this case? At least I think that's what you're suggesting above? Otherwise, don't we all agree that the spec is rather vague in this aspect, and doesn't confirm nor rule out a change of count between queries?

@kvark
Copy link
Collaborator

kvark commented Aug 24, 2021

The problem originates from this clause:

Because the list of available layers may change externally between calls to vkEnumerateInstanceExtensionProperties, two calls may retrieve different results if a pLayerName is available in one call but not in another. The extensions supported by a layer may also change between two calls, e.g. if the layer implementation is replaced by a different version between those calls.

This is specifically about vkEnumerateInstanceExtensionProperties. If it applied to other functions, we'd expect to see similar statements there. So the spec doesn't appear to be vague here.

@MarijnS95
Copy link
Collaborator Author

@kvark Now that brings something new to the table. Up to this point it looks like everyone failed to notice that bit in the spec but you.

Are you aware of any other function that has the same or a similar mention in the spec?

I'll update this PR ASAP to deal with this.

@kvark
Copy link
Collaborator

kvark commented Aug 24, 2021

It's not new, it's the starting point - #464 (comment)
Searching the spec reveals that this is the only function with that kind of obscure behavior. We can cover more later if we discover anything - the changes are backwards compatible anyway.

@filnet
Copy link
Contributor

filnet commented Aug 24, 2021

VK_INCOMPLETE can happen :

1/ if the user provides an array that is too small.
But that can't happen with the proposed change.

2/ when using a Vulkan function that explicitly mentions that the list can change between two calls.
Only vkEnumerateInstanceExtensionProperties mentions it but I find it a bit dangerous to assume that no other functions has this behavior. And Vulkan might add more such functions or change the behavior of existing functions.

3/ with bad user code ?
Would it be possible for user code to issue a concurrent function call in between the "two calls" which would cause the list size to change ? I doubt that this is possible since Vulkan is mostly stateless but who knows.

@MarijnS95
Copy link
Collaborator Author

It's not new, it's the starting point - #464 (comment)

New in the sense that I hadn't noticed / forgotten this was mentioned in the issue.

Searching the spec reveals that this is the only function with that kind of obscure behavior. We can cover more later if we discover anything - the changes are backwards compatible anyway.

Great, only one function to deal with then.

VK_INCOMPLETE can happen :

1/ if the user provides an array that is too small.
But that can't happen with the proposed change.

Couldn't happen before either, because Ash was querying the size and directly reusing this value in every instance.

2/ when using a Vulkan function that explicitly mentions that the list can change between two calls.
Only vkEnumerateInstanceExtensionProperties mentions but I find it a bit dangerous to assume that no other functions has this behavior. And Vulkan might add more such functions or change the behavior of existing functions.

We'll do this just for vkEnumerateInstanceExtensionProperties now. No-one has ever seen this before anyway, and it's no surprise that the only occurrence of a changing value happens on a function that has it clearly documented. Nothing dangerous IMO.

3/ with bad user code ?
Would it be possible for user code to issue a concurrent function call in between the "two calls" which would cause the list size to change ? I doubt that this is possible since Vulkan is mostly stateless but who knows.

Strongly doubt this. Maybe available layers can change at any point since that's an external factor?

@filnet
Copy link
Contributor

filnet commented Aug 24, 2021

What about functions related to hardware ?
vkGetPhysicalDeviceDisplayPropertiesKHR for example. What if a display device gets connected/disconnected between the two calls.

@MarijnS95
Copy link
Collaborator Author

What about functions related to hardware ?
vkGetPhysicalDeviceDisplayPropertiesKHR for example. What if a display device gets connected/disconnected between the two calls.

Same thoughts here; quick testing shows this doesn't change until restarting the application. Maybe that is the case for similar functions, too.

@Ralith
Copy link
Collaborator

Ralith commented Aug 24, 2021

We'll do this just for vkEnumerateInstanceExtensionProperties now.

We should do this for every function that may return VK_INCOMPLETE unless the spec explicitly guarantees that the size is fixed. "It worked on my machine" is not an excuse for UB. It's not like there's a drawback to being defensive here.

@MarijnS95
Copy link
Collaborator Author

We should do this for every function that may return VK_INCOMPLETE unless the spec explicitly guarantees that the size is fixed. "It worked on my machine" is not an excuse for UB. It's not like there's a drawback to being defensive here.

Rather, there's two camps now: those who want to spend one extra cycle for the compare, and those who don't. I'm all for the defensive route, feel free to merge.

@MarijnS95 MarijnS95 marked this pull request as draft August 24, 2021 20:52
…erties

The type and next-pointer of output structures must always be
initialized.

Fixes: ac4d046 ("Added `VK_EXT_tooling_info` extension support (ash-rs#310)")
@MarijnS95 MarijnS95 marked this pull request as ready for review August 24, 2021 21:01
@kvark
Copy link
Collaborator

kvark commented Aug 24, 2021

I don't care about extra cycles for this. I care about Ash knowing what API it abstracts over. So far, Vulkan spec does not explicitly spell out any guarantees about what the count is between different invocations, except for vkEnumerateInstanceExtensionProperties. If it's also the case for other functions (like vkGetPhysicalDeviceDisplayPropertiesKHR), then I'd say this would be a spec bug. But Ash doesn't need to be smarter than the spec.

Also, I'm no longer sure that the loop is the right solution here. What's the point of ensuring that you got exactly all of the elements at some point of the program execution? If I got 3 layers reported, that doesn't mean that the next second the configuration isn't going to change, anyway.

Did we consider just doing the following:

  1. do the regular call for getting the count, and then the call to fill the data
  2. if the returned count is less, issue a log::warn and continue
  3. if the returned error is VK_INCOMPLETE, issue log::warn and continue

This seems marginally better than having a loop. And in a scenario where it happens - your configuration changed between two invocations - there isn't any guarantee that it's not going to change right after the call, anyway.


This is all super bike-sheddy, and I understand if you folks just don't consider it worth improving. Landing the patch as is would be fine by me.

@filnet
Copy link
Contributor

filnet commented Aug 24, 2021

There's also vkEnumerateInstanceLayerProperties ;)

@Ralith
Copy link
Collaborator

Ralith commented Aug 25, 2021

So far, Vulkan spec does not explicitly spell out any guarantees about what the count is between different invocations, except for vkEnumerateInstanceExtensionProperties. If it's also the case for other functions (like vkGetPhysicalDeviceDisplayPropertiesKHR), then I'd say this would be a spec bug.

If the spec guarantees a thing in one case and doesn't guarantee that thing in other cases, that's not a bug, it's the absence of a guarantee. When behavior isn't guaranteed, we cannot assume a single possibility.

What's the point of ensuring that you got exactly all of the elements at some point of the program execution? If I got 3 layers reported, that doesn't mean that the next second the configuration isn't going to change, anyway.

Good question. Unfortunately, it is necessary, because there's no guarantee which results are omitted from an incomplete set. A user can reasonably be expected to tolerate having fractionally out of date information regarding a change in configuration/whatever that just happened, but they should not expect to lose information that hasn't ever changed just because another datum in the same category changed.

log::warn and continue

IMO adding log to the dependencies and public interface of the crate is strongly contrary to the spirit of ash; user-facing diagnostics are not the responsibility of a Vulkan binding.

@Ralith
Copy link
Collaborator

Ralith commented Aug 25, 2021

Merging as-is since this is a clear improvement over the status quo, and a code size reduction besides. We can discuss in a new issue if anyone feels further refinement is necessary.

@Ralith Ralith merged commit 00abdfc into ash-rs:master Aug 25, 2021
@MarijnS95 MarijnS95 deleted the enumerate-incomplete branch August 25, 2021 20:07
@MarijnS95
Copy link
Collaborator Author

then I'd say this would be a spec bug. But Ash doesn't need to be smarter than the spec.

(In addition to what @Ralith said, which I totally agree with) I say we do, for the sake of being friendly to crate users instead of leaving them open to this under-specified part of the spec (as @Ralith said, there's no explicit guarantee that this won't happen).

What's the point of ensuring that you got exactly all of the elements at some point of the program execution? If I got 3 layers reported, that doesn't mean that the next second the configuration isn't going to change, anyway.

If we get back less, that's fine. If we get back more, that's a VK_INCOMPLETE that usually propagates up through an application as Err()?. We could do a couple things here:

  1. Print some logs. I'm not a fan of this either, it's unexpected from core crates and very easy to miss. There are better solutions:
  2. Explicitly encode this artifact in Ash's public API, like for example fn acquire_next_image(). On VK_INCOMPLETE, we still return the Vec<_> and leave it up to the caller to decide to take the result or call this function again.

And if it's less, I don't think the caller is interested in that at all. Unless they're doing sub-microsecond poking of the Vulkan API to figure out when exactly this count changes (higher or lower) - I'll redirect them to use the raw pointer functions if that's their intended usecase 😬

@MarijnS95
Copy link
Collaborator Author

0.33.2 is now released with this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

VK_INCOMPLETE returned by enumerate_instance_extension_properties
4 participants