Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed-up Rendering #270

Closed
11 tasks done
p0nce opened this issue Jun 25, 2018 · 6 comments
Closed
11 tasks done

Speed-up Rendering #270

p0nce opened this issue Jun 25, 2018 · 6 comments
Labels
Performance This issue is about performance enhancement.

Comments

@p0nce
Copy link
Collaborator

p0nce commented Jun 25, 2018

  • Measure all passes with Couture/Panagement/Graillon
  • Speed-up component swap with intel-intrinsics (EDIT => OK)
  • Speed-up look-up table with intel-intrinsics (EDIT =>failure to do so)
  • Try to optimize raw copy (EDIT: failure to do so, byte copy loop is faster in isolation, but not in context)
  • Consider not waking up threads when there is less tasks than threads in thread pool => OK
  • Consider using less threads (Voxengo advised 3 on KVR => https://www.kvraudio.com/forum/viewtopic.php?f=33&t=495960&p=7039726&hilit=three+threads#p7039726) (EDIT: using 2 threads max starting with v8.0.7)
  • Consider splitting large Raw widgets in parts if not parallel (no because it's more threading so will have other problems)
  • Speed-up PBRCompositor with intel-intrinsics (EDIT: rarely called now, postponed)
  • Consider using better condvars on Windows (currently: emulated for XP) => no speed-up! It slow things down..
  • Consider fewer/wider patches for PBR compositor => would be great so that small PBR updates don't trigger 2 threads! => done, patch size is 128x128 max
  • Investigate SetDIBitsToDevice vs alternatives => it seems we can get a region with several smaller rectangles which should increase performance => implemented for Windows, most useful when widgets far from each other are repainted. (v8.08). For Cocoa already the case as it seems.
@p0nce p0nce added the Performance This issue is about performance enhancement. label Jun 25, 2018
@p0nce p0nce changed the title Speed-up PBR again Speed-up Rendering Oct 7, 2018
@p0nce
Copy link
Collaborator Author

p0nce commented Oct 11, 2018

With Couture 32-bit as of v8.0.2: (EDIT: it shouldn't draw in PBR layer, what is happening?)

  • Draw PBR: 0.19ms (EDIT: was amortized to zero when not moving PBR widgets)
  • Mipmap: 0.05ms (EDIT: was amortized to zero when not moving PBR widgets)
  • PBR Compositing: 0.51ms (EDIT: was amortized to zero when not moving PBR widgets)
  • Copy to Raw: 0.18ms
  • Draw RAW: 0.72ms (around 0.25ms of color-correction)
  • Reorder: 0.26ms (EDIT: down to 0.22 with intel-intrinsics) (EDIT: down to 0.19 with LLVM shufflevector)

@p0nce
Copy link
Collaborator Author

p0nce commented Oct 13, 2018

  • Measure Couture UI threaded vs non-threaded

Threaded, baseline:

  • Draw PBR 0.19ms
  • Mipmap 0.04ms
  • Compositing 0.49ms
  • Copy to Raw 0.22ms
  • Draw Raw 0.76ms
  • Reorder 0.19ms

Non-threaded, thread-pool modified:

  • Draw PBR 0.01ms

  • Mipmap 0.00ms

  • Compositing 0.33ms

  • Copy to Raw 0.21ms (same, wasn't threaded)

  • Draw Raw 0.74ms (TODO: optimize color lookup)

  • Reorder 0.18ms (same, wasn't threaded)

  • Understand what takes times despite no PBR layer update => the problem was with threads getting awake and then realizing they have no task to do

@p0nce
Copy link
Collaborator Author

p0nce commented Oct 17, 2018

Based on Dplug v8.0.5:

Graillon bench in steady state:

  • Draw PBR 0.01ms
  • Draw Mipmap 0.00ms
  • Compositing 0.22ms
  • Copy to RAW 0.13ms
  • Draw RAW 0.50ms
  • Reorder 0.12ms

Panagement steady state (only goniometer updates):

  • Draw PBR 0.00ms
  • Draw Mipmap 0.00ms
  • Compositing 0.04ms
  • Copy to RAW 0.12ms
  • Draw RAW 0.39ms
  • Reorder 0.12ms

All pretty reasonable compared to Couture

@p0nce
Copy link
Collaborator Author

p0nce commented Oct 17, 2018

Testing with thread pools of 2, 3 and 4 threads:

  • 2-threads makes plugin that can be inserted with less disruption when playing
  • 2-threads makes a bit less audio clicks on operation
  • pure rendering time: less widgets are Raw-drawed concurrently, in Couture that makes a 0.5ms task go to 0.7ms
    => slight performance hit when a plugin is alone, but probably worth it for stability

absolute-values

@p0nce
Copy link
Collaborator Author

p0nce commented Oct 17, 2018

PBR compositor max patch size extended to 128x128. Often this make only one tile hence not launching threads. Adopted 2 threads too.

@p0nce
Copy link
Collaborator Author

p0nce commented Oct 18, 2018

Future improvements to be found:

  • Getting a faster RGBA software renderer for line segments etc.
  • Optimizing the PBR compositor eventually (though it could also be made slower/more expensive now)

@p0nce p0nce closed this as completed Oct 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance This issue is about performance enhancement.
Projects
None yet
Development

No branches or pull requests

1 participant