Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile and optimize frontend (from CGF submission to serialized graph) #108

Closed
PeterTh opened this issue Mar 14, 2022 · 1 comment
Closed

Comments

@PeterTh
Copy link
Contributor

PeterTh commented Mar 14, 2022

#107 has frontend benchmarks that do not instantiate the entire runtime.

Low hanging fruit:

  • Currently, 13% of the time is spent in iostreams for creating the CDAG debug labels. Maybe this should happen in graph printing, not generation (there is already a PoC implementation for that)
  • malloc/free pairs have significant impact deep in dependency checking. We can potentially eliminate a lot of convenience allocations (get_accessed_buffers et al)

Requires further investigation:

  • Use a bump allocator for intrusive graphs to improve locality, e.g. a heuristically sized pool per horizon + fallback as needed (maybe also for the dependencies vectors?)
  • command / task map rehashing is also a noticeable factor, but memory is abundant - reserve() them to avoid stalls
@PeterTh
Copy link
Contributor Author

PeterTh commented Sep 8, 2023

I think this is actually (reasonably) complete at this point. The low hanging fruit has been addressed by various changes and I don't think we are substantially limited by "frontend" performance in real-world benchmarks right now.

@PeterTh PeterTh closed this as completed Sep 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant