Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize chunk transform streaming #55

Open
2 of 3 tasks
Ralith opened this issue Apr 10, 2020 · 2 comments
Open
2 of 3 tasks

Optimize chunk transform streaming #55

Ralith opened this issue Apr 10, 2020 · 2 comments
Assignees
Labels
enhancement New feature or request performance Something's slower than it should be

Comments

@Ralith
Copy link
Owner

Ralith commented Apr 10, 2020

Currently, for each frame, for each chunk, we invoke vkCmdUpdateBuffer with the transform from that chunk to the local node. In a valley, this can add up to hundreds of kilobytes. This is a bit of an abuse of vkCmdUpdateBuffer and may explain the large CPU time spent preparing to render chunks. There are a number of improvements to be made:

  • Use a staging mapped buffer and transfer command. This should mitigate driver overhead, and may improve performance substantially all on its own.
  • Because the underlying honeycomb is regular, we can drastically reduce the amount of bandwidth used by storing a precomputed table of transforms to the origin node from the chunks surrounding the origin node out to the maximum view distance, and maintaining a buffer of indices mapping the neighborhood of the player to analogous chunks surrounding the origin. This buffer is 1/32 the size of the current transform buffer, and would need to be rewritten every time the player moves between nodes, but small incremental writes could be used otherwise. This also saves us from doing a bunch of matrix multiplication as we traverse the graph, which might improve traversal performance significantly (currently 2-4ms/frame).
  • As of Smuggle voxel chunk ID through indirect buffer #53, chunk transform information (of whatever nature) can be passed through an instance buffer rather than looked up in a storage buffer, simplifying and perhaps slightly optimizing the vertex shader.
@Ralith Ralith added enhancement New feature or request performance Something's slower than it should be labels Apr 10, 2020
@Ralith Ralith self-assigned this Apr 10, 2020
@Ralith
Copy link
Owner Author

Ralith commented Apr 11, 2020

Partially fixed by #63. CPU use during graph traversal remains significant, but performance is much improved overall.

@Ralith
Copy link
Owner Author

Ralith commented Jul 19, 2020

The precomputed transform table could also potentially form a foundation for removing per-chunk draw calls, in favor of a multi-draw-indirect with compute frustum culling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance Something's slower than it should be
Projects
None yet
Development

No branches or pull requests

1 participant