Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data-store graph walk restricted #5475

Merged
merged 1 commit into from
Apr 26, 2023

Conversation

dwsutherland
Copy link
Member

@dwsutherland dwsutherland commented Apr 20, 2023

closes #5435

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
  • Covered by existing tests.
  • CHANGES.md entry included if this is a change that can affect users
  • Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
  • If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

@dwsutherland dwsutherland added the efficiency For notable efficiency improvements label Apr 20, 2023
@dwsutherland dwsutherland self-assigned this Apr 20, 2023
@oliver-sanders
Copy link
Member

Have repeated the reported result, this reduces the number of calls of increment_graph_window from 49021002/7001 to 14002/7001 for the example in #5435 🎉. Doubtless the single biggest complexity reduction for cylc-flow in a single change ever! There are 7001 tasks in the workflow causing the 7001 original calls, each makes two recursive calls giving the 14002 which makes sense.

The revised algorithm seems to be producing the same results, these screenshots come from the same workflow, installed & run under upstream/8.1.x and dwsutherland/graph-walk-efficiency with the same operations performed on both. The graph view displays the same results:

Screenshot from 2023-04-20 11-49-25
Screenshot from 2023-04-20 11-49-38

@dwsutherland
Copy link
Member Author

dwsutherland commented Apr 20, 2023

The only down side is; increasing the window size will no longer flesh out the entire workflow graph, it will only show the the direct ancestors/descendants of active nodes.

consider:

a => b => d
c => d

if b is the only active task, expanding the n-window about it will no longer show c
(the walk no longer travels down to d then up to c, only down or only up)

But obviously, no down sides with n=1 .. And all the tests passed, so nothing missing there.

If we're happy with this trade-off, perhaps we should get this one in and work on the no-rewalk change separately.

@oliver-sanders
Copy link
Member

Ah, right, this has removed the ability to generate the n>=1 window?

@dwsutherland
Copy link
Member Author

dwsutherland commented Apr 20, 2023

Ah, right, this has removed the ability to generate the n>=1 window?

Nope, n>=1 is still there, but it will no longer travel downstream then upstream, or upstream then downstream.

(you won't see the cousins)

Will update the above comment

@oliver-sanders
Copy link
Member

From a bit of experimentation with this pattern:

a => b<N> => c

The number of times increment_graph_window is called (recursive) scales according to:

Before After
2N^2 + 3N + 1 5N + 1

This change removes the N^2 term which is the nasty one.

The factor of 5 in the new solution is a little odd, not sure where that's coming from. Would be great to reduce it if possible, but linear scaling is the important bit for now.

The lack of cousins in the graph window isn't great, however, this doesn't affect the N=1 window (default) and we don't currently make it easy for users to switch the n-window in the GUI so this shouldn't have any impact right now.

Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the performance is well worth the caveats, we can always improve later, thanks @dwsutherland!

@oliver-sanders oliver-sanders added this to the cylc-8.1.3 milestone Apr 24, 2023
@oliver-sanders
Copy link
Member

(can drop in a changelog later)

@hjoliver
Copy link
Member

we don't currently make it easy for users to switch the n-window in the GUI so this shouldn't have any impact right now.

Ha, note cylc/cylc-ui#1281

@hjoliver
Copy link
Member

hjoliver commented Apr 25, 2023

I think the performance is well worth the caveats, we can always improve later, thanks @dwsutherland!

Agreed, fantastic.

The only down side is; increasing the window size will no longer flesh out the entire workflow graph, it will only show the the direct ancestors/descendants of active nodes.

This may be OK for the n-window. For n=cycle we shouldn't need to walk out from active tasks.

@hjoliver
Copy link
Member

On my laptop, it took 15s to spawn 7000 tasks on this branch vs 400s on master 🎉

Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, merging this now to get it in 8.1.3

@dwsutherland - as a small follow-up could you just add some code comments to explain the current functionality and the "cousins problem"?

@hjoliver hjoliver merged commit 7ed1c5f into cylc:8.1.x Apr 26, 2023
@hjoliver hjoliver mentioned this pull request Apr 26, 2023
8 tasks
@hjoliver
Copy link
Member

(I'll push a hot-fix change log entry for this)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
efficiency For notable efficiency improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

efficiency: increment_graph_window
3 participants