Fix snark pool long async cycles #13409

nholland94 · 2023-06-15T00:12:21Z

As part of investigating #13324, it was discovered that the primary underlying issue was that the snark pool was leaking. The issue causing the leak was that the snark pool refcount frontier extension was not properly decrementing entries when the frontier root transitions. Additionally, when the extension would compute the removed entries, it would only count the number of scan state that caused entries to be removed, which negatively impacted the garbage collection logic in the snark pool as the implementation relied on this count.

Fixing the snark pool refcount leak would have caused another issue in that the snark pool garbage collection logic, which is now triggered properly with the fix, was wildly inefficient. The worst case number of works in the snark pool is about 2*720*128*0.75=138240 (fork_factor*k*works_per_block*f), which was large enough to warrant addressing this.

The following changes are included in this PR:

snark pool persistence was removed
fixed bug in snark pool refcount table where old root scan state works were not decremented
optimized snark pool garbage collection by changing snark pool refcount to send a negative diff when works are invalidated
optimize snark pool refcount view broadcast logic not send irrelevant diffs
removed unnecessary references to snark pool refcount internal tables within the snark pool

Closes #13324.

nholland94 · 2023-06-15T00:18:38Z

!ci-build-me

deepthiskumar · 2023-06-20T19:52:50Z

src/lib/transition_frontier/full_frontier/full_frontier.ml

@@ -328,6 +328,7 @@ let calculate_root_transition_diff t heir =
    (Root_transitioned
       { new_root= new_root_data
       ; garbage= Full garbage_nodes
+       ; old_root_scan_state= Full (Breadcrumb.staged_ledger root |> Staged_ledger.scan_state)


can we not include the old root in garbage?

That would require more changes in all the other extensions and places we consume these diffs. There's a comment that mentions we explicitly leave it out. I don't recall why, but there was a reason, and we would need to update logic that was not expecting the old root to be in garbage to now handle that.

maybe add a comment when defining this? (also, where's the comment you mentioned? couldn't find it in this file)
It's only in Ledger_table extension and in catchup and frontier's apply-diff. And it is not clear what expected in these places. Should we look into this further? If so, could you please open an issue for this?

Also, where does the old root get removed if it is not part of garbage?

Here is the comment I was referring to. It's where we define the diff type itself, and describes its semantics. https://github.com/MinaProtocol/mina/blob/develop/src/lib/transition_frontier/frontier_base/diff.mli#L107

The old root is removed when we process the root transition diff in the full frontier as a special case, and doesn't get treated the same way as garbage does. https://github.com/MinaProtocol/mina/blob/develop/src/lib/transition_frontier/full_frontier/full_frontier.ml#L350

deepthiskumar · 2023-06-20T19:56:14Z

src/lib/network_pool/snark_pool.ml

+          ~metadata:[ ("num_removed", `Int (List.length removed_work)) ] ;
+        List.iter removed_work ~f:(fun work ->
+            Hashtbl.remove t.snark_tables.rebroadcastable work ;
+            Hashtbl.remove t.snark_tables.all work ) ;


Shouldn't these be kept until the work is not referenced in the frontier?

The work we receive here is work that was removed from the refcount table. Work is only removed from the refcount table if there are no remaining references in the frontier.

nholland94 · 2023-07-10T17:15:40Z

!ci-build-me

nholland94 · 2023-07-11T16:14:36Z

!ci-build-me

nholland94 requested a review from a team as a code owner June 15, 2023 00:12

mrmr1993 approved these changes Jun 15, 2023

View reviewed changes

deepthiskumar reviewed Jun 20, 2023

View reviewed changes

georgeee mentioned this pull request Jun 22, 2023

Snark works are not rebroadcast periodically #13451

Closed

georgeee mentioned this pull request Jul 10, 2023

Fix snark pool long async jobs #13550

Closed

7 tasks

nholland94 changed the base branch from release/1.4.0 to compatible July 10, 2023 17:14

nholland94 added 3 commits July 10, 2023 11:15

Fix snark pool leak + optimize snark pool garbage collection

ab81256

Remove snark pool persistence

3bfc796

Map based snark pool tables

3080c5e

nholland94 force-pushed the fix/snark-pool-long-async-cycles branch from fa7893b to 3080c5e Compare July 10, 2023 17:15

This was referenced Jul 10, 2023

Fix snark pool long async cycles (merge into develop) #13555

Closed

Fix snark pool long async cycles (merge into berkeley) #13556

Merged

Use Map for snark pool refcount

c0644fd

nholland94 merged commit cec59e3 into compatible Jul 12, 2023
1 check passed

nholland94 deleted the fix/snark-pool-long-async-cycles branch July 12, 2023 17:52

tizoc mentioned this pull request Jul 19, 2023

Optimization: Make hashing over transactions snarks partial (ledger hashes) #12521

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix snark pool long async cycles #13409

Fix snark pool long async cycles #13409

nholland94 commented Jun 15, 2023

nholland94 commented Jun 15, 2023

deepthiskumar Jun 20, 2023

nholland94 Jun 20, 2023

deepthiskumar Jun 20, 2023

nholland94 Jun 26, 2023

deepthiskumar Jun 20, 2023

nholland94 Jun 20, 2023

nholland94 commented Jul 10, 2023

nholland94 commented Jul 11, 2023

Fix snark pool long async cycles #13409

Fix snark pool long async cycles #13409

Conversation

nholland94 commented Jun 15, 2023

nholland94 commented Jun 15, 2023

deepthiskumar Jun 20, 2023

Choose a reason for hiding this comment

nholland94 Jun 20, 2023

Choose a reason for hiding this comment

deepthiskumar Jun 20, 2023

Choose a reason for hiding this comment

nholland94 Jun 26, 2023

Choose a reason for hiding this comment

deepthiskumar Jun 20, 2023

Choose a reason for hiding this comment

nholland94 Jun 20, 2023

Choose a reason for hiding this comment

nholland94 commented Jul 10, 2023

nholland94 commented Jul 11, 2023