-
-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extremely poor performance?? #2796
Comments
We are currently rewriting the renderer. This will result in a huge increase in rendering performance. This is the |
Sure. Just pulled that branch and ran that benchmark. Seems around 5.5x faster, so a relatively decent improvement, but still a far cry from what I'd expect to see given what I've been able to get out of Unity's DOTS and even my own Javascript ECS experiments. Is this normal? Is there something I can run to see a breakdown of timings per frame to isolate the bottleneck? [Edited to add] * actually it may not be 5x faster... hard to tell, but the non-pipelined branch is rendering exactly 5 different color variants of the sprites, whereas the pipelined is only rendering 1. That may be a coincidence, but at the least it's an apples to oranges comparison right now. |
Batching is not yet implemented. There is a PR for sprite batching: #2642 As for benchmarking I now there is a tracing infrastructure to get a breakdown per system, but I haven't used it myself, so I don't know exactly how to use it. I believe it is a feature flag and then a profile gets written to a file in the chrome profiler format. |
Just as a heads up: that spawner example is still using the old renderer, even if you are on the pipelined-rendering branch. |
Another datapoint. I just tried bevymark out of curiosity on the I'm also testing on Windows 10 with a GeForce RTX 3090. The only major difference is my CPU: Ryzen 9 5900X. Release profile with some extra optimizations in [target.x86_64-pc-windows-msvc]
linker = "rust-lld.exe"
rustflags = ["-C", "target-cpu=native"] Oh, and I'm using a nightly compiler:
|
@benoneal with your computer's specs you should be getting way higher performance, especially using the new renderer. Are you absolutely certain you are running bevymark_pipelined in release mode? My computer is older / weaker than yours and I'm getting ~67,000 sprites (and ~8,000 sprites in the old renderer). The performance you have been reporting across the board is in-line with debug build performance, which makes me question the results a bit. What is the full command you are running on the command line? |
Not bragging, gloating, or anything. Just for comparison, I'll post here the Unity DOTS-related video with timecode so it may be useful: https://youtu.be/ILfUuBLfzGI?t=1245 UPD: the specs in the video are shown on task manager dashboard |
It's important to note that video is using the previous version of the dots "hybrid renderer". That version used gpu instancing, not batching. A proper implementation of a game using gpu instancing can easily render millions of sprites, even in Unity without dots, without breaking a sweat. However there's a lot of other issues that come along with gpu instancing. Namely compatibility with older hardware, and it doesn't support per instance overrides. The current version of the hybrid renderer (v2) does use batching and does support per instance material overrides. My expectation is it would still greatly outperform bevy rendering right now as it's been worked on by a team of talented engineers for years at this point. So maybe not a fair comparison. |
Even without batching the new Bevy renderer currently matches the Unity DOTS performance from the youtube video on my machine (over 100k bunnies at ~41 fps): My hardware is slightly better, but its still in the same category (just about a year older):
With batching (which there is an open pr out for), I'm hitting 164k bunnies at ~41 fps I promise the new renderer will have competitive performance.
@sarkahn |
The "millions" I was referring to was from static sprites being animated or moved on the gpu. However a couple of years ago when I was first started learning dots I and a bunch of other people were seeing how far we could push it with instancing. https://forum.unity.com/threads/200k-dynamic-animated-sprites-at-80fps.695809/ This was not using hybrid renderer at all - just raw Anyways, I was curious so I tried to make a unity version of bevymark using the most up to date version of all the dots packages. I tried to make it as equivalent to bevymark as I could given my knowledge of dots and bevy. It started to spike below 60fps around 120K: For comparison's sake I can get to around 40K birds in bevymark using |
Thanks for putting that together! I have a couple of questions to help with the comparison:
You were pretty clear about this, but just to be doubly sure: these were still "statically" positioned sprites (iirc DrawMeshInstancedIndirect allows for reusing things like positions across frames) that were then fed their per-sprite-entity animation indices to select an animation-frame per-sprite-frame. So the amount of data transfered per frame was 1 integer (u8? u32?) per sprite? If so that lines up pretty well with my expectations for performance. |
Can you also try the batched rendering branch: #2642? |
I reworked it a bit to ensure they're spread out a bit more - I was spawning a lot in the same spot. Same results: You can see it's mostly a solid 60fps at around 120k. Any more than that and it starts to spike a lot.
Intel core i7 8700 3.2
They were static yeah, from what I remember (it's been a while) I had set up a separate buffer for transforms, uv data, and color data. I think I could have re-used them per frame but I wasn't because I couldn't figure out a nice way to do it, it was easier to just re-create the buffer every frame and re-fill it - filling it was just a memcopy from a native array. Then all the buffers would get pushed to the material every frame: https://github.com/sarkahn/SpriteSheetRenderer/blob/Rewrite/Systems/SpriteRenderSystem.cs Knowing what I know now I'm sure I could have been a lot smarter about it but I was learning at the time. So yeah, it was transform data (4x4 matrix), colors, and uv data, being pushed to the material every frame. |
From |
Hi, I am evaluating Bevy for use in a visualization tool we are working on, and I also got some weird results. I'm currently testing Kiss3D to do the same, and getting stable 60 fps with 10 000 boxes test as in |
If you are looking at the percentage indicators in Windows Task Manager, it will only show 100% CPU utilization when all CPU cores are completely busy. So if it is showing around 10% utilization, you probably have something like 8 or 12 cores, and only one of them is doing work. |
Thanks, but it's more or less 10% per core (seen in htop). I'm in Linux (Arch) if that helps? |
What I'm getting at is that the Line 30 in b9e0241
This is just a guess, I haven't done any profiling. The ECS should be able to parallelize updates by pushing iteration to it with simpler queries. But it will only be able to parallelize queries with shared borrows. I would expect you to only see a single CPU core at 100% if core affinity was being used. The scheduler can (and should) schedule threads to random cores for each time slice. It keeps single-thread performance high by keeping core temperatures low. |
I'd like to point out that any efforts to zero in on why the current renderer is slow are relatively pointless. We already know it is slow and most of the reasons why it was slow. It is getting retired in the next Bevy release. We've already done investigations into this, which have informed the design of the new renderer, which is shaping up quite nicely.
We automatically parallelize system execution based on query access. Individual queries within a system can be accessed in a parallel context and you can do parallel iteration over any query in a system. |
And to be clear, the |
Thanks for the clarification @cart ! |
I just opened a PR that adds sprite batching to the new renderer. I'm getting ~130,000 sprites at 60fps on |
For anyone testing performance on their machines, as cart noted, only the new pipelined-rendering branch is relevant at this point, and only the examples that have been updated to make use of the pipelined-rendering renderer, and they are as far as I am aware all named |
for me on a Ryzen 5 5600X and Radeon RX570 machine. FPS began to drop to |
I'm closing this out as we have merged the new renderer, which resolves performance issues generally. Obviously theres always more work to do, but we're now competitive with other projects and it is only up from here. Feel free to open more specific issues as we encounter them, such as this Mac-specific issue: #3052 |
Bevy 0.5.0, Windows 10
What you did
What you expected to happen
Blisteringly high frame rates.
What actually happened
Additional information
The text was updated successfully, but these errors were encountered: