Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce ledger-tool simulate-block-production #2733

Merged
merged 52 commits into from
Sep 11, 2024

Conversation

ryoqun
Copy link
Member

@ryoqun ryoqun commented Aug 25, 2024

Problem

Even while solana-labs#29196 has landed and has been enabled for a while, there's no code to actually simulate the block production.

Summary of Changes

Finally, introduce the functionality called agave-ledger-tool simulate-block-production with following flags:

$ agave-ledger-tool simulate-block-production --help
...
OPTIONS:
...
        --first-simulated-slot <SLOT>
            Start simulation at the given slot
        --no-block-cost-limits
            Disable block cost limits effectively by setting them to the max
        --block-production-method <METHOD>
            Switch transaction scheduling method for producing ledger entries [default: central-scheduler] [possible
            values: thread-local-multi-iterator, central-scheduler]

On top of it, this pr also makes it possible to replay the simulated blocks later by persisting the shreds into the blockstore and adjusting the replay code-path a bit. Namely, the following flags are added:

$ agave-ledger-tool verify --help
...
OPTIONS:
...
        --abort-on-invalid-block
            Exits with failed status early as soon as any bad block is detected
        --no-block-cost-limits
            Disable block cost limits effectively by setting them to the max
        --enable-hash-overrides
            Enable override of blockhashes and bank hashes from banking trace event files to correctly verify blocks
            produced by the simulate-block-production subcommand

(bike-shedding is welcome, btw...)

This pr is extracted from #2325

sample output (with the mainnet-beta ledger)

notice that (bank) hashes and last_blockhashes are overidden while account_delta (hashes) are different.

While, tx counts largely differ from the actual log, both runs of simulation exhibits similar numbers, indicating rather stability of simulation.

The difference between simulation and reality is due to other general system loads, probably.

simulated log (1):

[2024-08-25T14:58:26.916751963Z INFO  solana_runtime::bank] bank frozen: 282254384 hash: DVkbZUJLMGHVat1GNdT4NT6qebSPH3uxiGa5hU7QSvZU accounts_delta: GajSceKAeDuAhCnRyXbv7qUsiEwdseJ1ebWDF7wyhWSg signature_count: 579 last_blockhash: A1FHFc8grbbS2zyc7a1byuU6VXUQHqsRpdydBhQM5ACx capitalization: 581802336581619111, stats: BankHashStats { num_updated_accounts: 2579, num_removed_accounts: 14, num_lamports_stored: 343411018630348, total_data_len: 9421212, num_executable_accounts: 0 }
[2024-08-25T14:58:27.270576266Z INFO  solana_runtime::bank] bank frozen: 282254385 hash: 4SpPMwTNNAfR6quW73NKqx7CRfbSoGyQMTeARez2Hftw accounts_delta: 4QfWEXTJUFr1kV6ET3tnYENppQv2BeAGNA1grDRpfPv4 signature_count: 1750 last_blockhash: 8FHhFcs4pgXEHvKohswtNPcx9KHvr8k6VEsunsRT9M6t capitalization: 581802336574886442, stats: BankHashStats { num_updated_accounts: 4964, num_removed_accounts: 2, num_lamports_stored: 516037578135707, total_data_len: 17010726, num_executable_accounts: 0 }
[2024-08-25T14:58:27.623465240Z INFO  solana_runtime::bank] bank frozen: 282254386 hash: HMtfZUB9Wgdw12c4adk17cGce13dSeaJgUsTc3mGpDKL accounts_delta: 74GUrNzB2M4ufASPBJdB65ZKPimA84AhPR8u1sjSAA94 signature_count: 1369 last_blockhash: 7xUVmhDbeVPxA4Gu5TKLd2qWU6L47MzRGxfMVwAWSQMg capitalization: 581802336570831246, stats: BankHashStats { num_updated_accounts: 3991, num_removed_accounts: 7, num_lamports_stored: 1485447148098157, total_data_len: 7028676, num_executable_accounts: 0 }
[2024-08-25T14:58:27.978300836Z INFO  solana_runtime::bank] bank frozen: 282254387 hash: 5UgDevAtVTUJZD9cjrAbpUPzdwtFPxxhiQM2sQ5L6EeW accounts_delta: D5eLRBxyYYNy6nGB3TrDtwmx6BkKjvy2diUBvMbM9zfi signature_count: 754 last_blockhash: En8erpgtuHcK3rEQA7i1JFcJB5AUjPpz57gHh5vHUgmr capitalization: 581802336567526926, stats: BankHashStats { num_updated_accounts: 2766, num_removed_accounts: 2, num_lamports_stored: 210579970838403, total_data_len: 9649857, num_executable_accounts: 0 }

simulated log (2):

[2024-08-26T06:21:39.116746518Z INFO  solana_runtime::bank] bank frozen: 282254384 hash: DVkbZUJLMGHVat1GNdT4NT6qebSPH3uxiGa5hU7QSvZU accounts_delta: 3WdvevVPam6vYSY9MRDY25ThdkryjLkmxYi2HLxtPPDT signature_count: 690 last_blockhash: A1FHFc8grbbS2zyc7a1byuU6VXUQHqsRpdydBhQM5ACx capitalization: 581802336580834700, stats: BankHashStats { num_updated_accounts: 2878, num_removed_accounts: 22, num_lamports_stored: 1463388803383965, total_data_len: 15984345, num_executable_accounts: 0 }
[2024-08-26T06:21:39.472078453Z INFO  solana_runtime::bank] bank frozen: 282254385 hash: 4SpPMwTNNAfR6quW73NKqx7CRfbSoGyQMTeARez2Hftw accounts_delta: GVDVyi9BdVjJiXdDUCgrYpBkAJ5ZZSXoKi89z6a8No2w signature_count: 1783 last_blockhash: 8FHhFcs4pgXEHvKohswtNPcx9KHvr8k6VEsunsRT9M6t capitalization: 581802336574005189, stats: BankHashStats { num_updated_accounts: 4915, num_removed_accounts: 1, num_lamports_stored: 1051960783828027, total_data_len: 15620129, num_executable_accounts: 0 }
[2024-08-26T06:21:39.827448111Z INFO  solana_runtime::bank] bank frozen: 282254386 hash: HMtfZUB9Wgdw12c4adk17cGce13dSeaJgUsTc3mGpDKL accounts_delta: JCJB95BFo69Rk3VZRF9GL4P6JYqRWoFKbT4ngmUVWPKj signature_count: 1352 last_blockhash: 7xUVmhDbeVPxA4Gu5TKLd2qWU6L47MzRGxfMVwAWSQMg capitalization: 581802336570362548, stats: BankHashStats { num_updated_accounts: 3872, num_removed_accounts: 2, num_lamports_stored: 319553718238402, total_data_len: 5976282, num_executable_accounts: 0 }
[2024-08-26T06:21:40.178714212Z INFO  solana_runtime::bank] bank frozen: 282254387 hash: CsCFYPXXcc9h6BwCeAcsgbpsbBp9kH7RLovFzGapzkZS accounts_delta: 7tM6MrmVEGMyMtxuycSgdnJDpoVGJfgiRkvcKJRGxzpF signature_count: 721 last_blockhash: B1NvabLf172iriwSxC6pt3fDxRVMgGZYTqWqqA37BAtT capitalization: 581802336566863144, stats: BankHashStats { num_updated_accounts: 2759, num_removed_accounts: 2, num_lamports_stored: 226979096844990, total_data_len: 7714591, num_executable_accounts: 0 }

actual log:

[2024-08-08T05:03:37.833306644Z INFO  solana_runtime::bank] bank frozen: 282254384 hash: DVkbZUJLMGHVat1GNdT4NT6qebSPH3uxiGa5hU7QSvZU accounts_delta: 4eR6tY1RbhjxW56qNnjbfyWgXUR78JggjH5WVo6cBt3v signature_count: 1797 last_blockhash: A1FHFc8grbbS2zyc7a1byuU6VXUQHqsRpdydBhQM5ACx capitalization: 581802336557110608, stats: BankHashStats { num_updated_accounts: 5197, num_removed_accounts: 40, num_lamports_stored: 888031815473795, total_data_len: 16030729, num_executable_accounts: 0 }
[2024-08-08T05:03:38.225493056Z INFO  solana_runtime::bank] bank frozen: 282254385 hash: 4SpPMwTNNAfR6quW73NKqx7CRfbSoGyQMTeARez2Hftw accounts_delta: EhXCEyiWG8RV7zQv2nsxm5dFimKMY4TYuA64y1GgxUcb signature_count: 1664 last_blockhash: 8FHhFcs4pgXEHvKohswtNPcx9KHvr8k6VEsunsRT9M6t capitalization: 581802336533852687, stats: BankHashStats { num_updated_accounts: 5102, num_removed_accounts: 33, num_lamports_stored: 2826520371573347, total_data_len: 23738722, num_executable_accounts: 0 }
[2024-08-08T05:03:38.620202820Z INFO  solana_runtime::bank] bank frozen: 282254386 hash: HMtfZUB9Wgdw12c4adk17cGce13dSeaJgUsTc3mGpDKL accounts_delta: rEFqYUdJvPLeNUF7nDjpxpP7jBJ6SjHmp5XAKq7vpTf signature_count: 1366 last_blockhash: 7xUVmhDbeVPxA4Gu5TKLd2qWU6L47MzRGxfMVwAWSQMg capitalization: 581802336525660471, stats: BankHashStats { num_updated_accounts: 4631, num_removed_accounts: 16, num_lamports_stored: 3357971643149105, total_data_len: 27580121, num_executable_accounts: 1 }
[2024-08-08T05:03:39.028377595Z INFO  solana_runtime::bank] bank frozen: 282254387 hash: ET4eF1A1hQQgGdC2ZKddxLSeSFdMPFqSrFKn5Ly2kLep accounts_delta: BNJuXeiw2AyefF8dc8guaV9FEuxyddUWRzvw8zV7GDj2 signature_count: 1163 last_blockhash: GZE8MzgEgyekrTVrKExNbh2xnoMjM7Bi7PNSb92MXqsF capitalization: 581802336510318235, stats: BankHashStats { num_updated_accounts: 4300, num_removed_accounts: 25, num_lamports_stored: 2674603072772107, total_data_len: 28823937, num_executable_accounts: 0 }

@ryoqun ryoqun force-pushed the simulate-block-production branch 8 times, most recently from f1dbbd9 to f965399 Compare August 26, 2024 06:05
ci/run-sanity.sh Outdated Show resolved Hide resolved
@ryoqun ryoqun requested a review from apfitzge August 26, 2024 06:27
@ryoqun ryoqun marked this pull request as ready for review August 26, 2024 06:28
runtime/src/bank.rs Outdated Show resolved Hide resolved
Comment on lines 694 to 698
// `self.keypair.read().unwrap().pubkey()` is more straight-forward to use here.
// However, self.keypair could be dummy in some very odd situation
// (i.e. ledger-tool's simulate-leader-production). So, use `self.my_contact_info` here.
// Other than the edge case, both are equivalent.
*self.my_contact_info.read().unwrap().pubkey()
Copy link
Member Author

@ryoqun ryoqun Aug 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@behzadnouri please let me know if you disagree with this changes in gossip/src/cluster_info.rs as justified by the source code comment.

Copy link

@behzadnouri behzadnouri Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pretty much would rather avoid introducing scenarios wherecluster_info.keypair.pubkey() != contact_info.pubkey.

Can you please provide more context why we need a ClusterInfo with a contact_info which we do not own the keypair? To me this seems pretty error-prone and I would much prefer we try an alternative.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @behzadnouri here. It seems the reason we are adding this new inconsistency is because BankingStage takes it as an argument.

It'd be better for us to refactor BankingStage to not use ClusterInfo imo.

We use ClusterInfo for 2 things:

  1. Creating the Forwarder
    • could pass an Option<Forwarder> as arg to BankingStage instead
    • ^ would need some changes to rip out mandatory forwarding in tlmi / voting threads
    • alternatively could make forwarder an enum w/ disabled variant or even a trait instead
  2. Getting validator id for checking if we are leader
    • easily can pass Pubkey instead

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, my dcou based hack is unpopular.. ;) I did my part of little hassle. how about this?: 2b33131

Can you please provide more context why we need a ClusterInfo with a contact_info which we do not own the keypair?

It seems the reason we are adding this new inconsistency is because BankingStage takes it as an argument.

@apfitzge 's understanding is correct. note that such broken ClusterInfo is only ever created under dcou code-path, though.

Getting validator id for checking if we are leader

* easily can pass `Pubkey` instead

Sadly, this isn't easy because the identity Pubkey can be hot-swapped inside ClusterInfo. That lead me to take this trait direction...:

  1. alternatively could make forwarder ... a trait instead

Also, note that BroadcastStageType also needs ClusterInfo. Fortunately, it seems the new hacky trait LikeClusterInfo plumbing isn't needed for it.

That said, I wonder this additional production code is worth to maintain, only to support some obscure development ledger-tool subcommand. Anyway, I'm not that opinionated. I just want to merge this pr.

@ryoqun
Copy link
Member Author

ryoqun commented Aug 26, 2024

fyi, not included in this pr. but I now have some fancy charts (salvaged solana-labs#28119) at the development branch.

namely, now that we can display the individual tx timings for each scheduler

(quick legend: x axis is walltime; y axis is lined up by each threads; green arced arrows are read-lock dependency, pink arced arrows are write-lock dependency)

thread-local-multi-iterator

image

each banking thread is working as hard as like animals. you can indirectly see batch boundaries.

central scheduler

image

image

much like to thread-local-multi-iterator, batched transactions show almost no gap (no overhead). while clipped, overall much less chaotic dep graph is observed.

lastly, because stickiness of write lock to a particular thread, the 2nd batch is rather large and other threads are idle (see the 2nd pic)

unified scheduler

image

read locks are well parallelized. each task execution incurs large overhead, but dep graph resolution is rather timely.

note that unified-scheduler is wired to the block production as well at #2325. That's why i have all the charts from the 3 impls...

Comment on lines 500 to 515
info!(
"jitter(parent_slot: {}): {}{:?} (sim: {:?} event: {:?})",
old_slot,
if elapsed_simulation_time > elapsed_event_time {
"+"
} else {
"-"
},
if elapsed_simulation_time > elapsed_event_time {
elapsed_simulation_time - elapsed_event_time
} else {
elapsed_event_time - elapsed_simulation_time
},
elapsed_simulation_time,
elapsed_event_time,
);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this particular log output are like this:

$ grep -E 'jitter' simulate-mb.2024y08m26d10h44m43s933116486ns
[2024-08-26T10:58:57.989324285Z INFO  solana_core::banking_simulation] jitter(parent_slot: 282254383): +360.27µs (sim: 12.00036027s event: 12s)
[2024-08-26T10:58:58.344810251Z INFO  solana_core::banking_simulation] jitter(parent_slot: 282254384): -19.615829ms (sim: 12.355846357s event: 12.375462186s)
[2024-08-26T10:58:58.695797971Z INFO  solana_core::banking_simulation] jitter(parent_slot: 282254385): -71.903503ms (sim: 12.706834067s event: 12.77873757s)
[2024-08-26T10:58:59.047995512Z INFO  solana_core::banking_simulation] jitter(parent_slot: 282254386): -135.656223ms (sim: 13.059031477s event: 13.1946877s)

in short, poh in sim is rather more timely than the actual traced poh recordings. maybe this is due to much reduced sysload.

Copy link

@apfitzge apfitzge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for the most part.
Tried to fix up some grammar in the documenting comments, and some suggestions to split up some of the larger functions so its' easier for me to read.

core/src/banking_simulation.rs Show resolved Hide resolved
core/src/banking_simulation.rs Outdated Show resolved Hide resolved
core/src/banking_simulation.rs Outdated Show resolved Hide resolved
core/src/banking_simulation.rs Outdated Show resolved Hide resolved
core/src/banking_simulation.rs Outdated Show resolved Hide resolved
if let Some(event_time) =
self.banking_trace_events.freeze_time_by_slot.get(&old_slot)
{
if log_enabled!(log::Level::Info) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logging functionality here should really get separated, it is quite long and distracting from the behavior of the loop.

"{} isn't leader anymore at slot {}; new leader: {}",
simulated_leader, new_slot, new_leader
);
break;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So when we're no longer leader the process will end.

In the future, could we extend this capability so that we "fast-forward" through our non-leader periods?

That probably adds considerable complexity, but I think would make simming significantly more useful.
AFAICT, as is we must load from snapshot everytime we want to do a sim of 4 slots - which will be very time consuming if I have hundred(s) of leader slots in my trace data that I'd like to simulate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, could we extend this capability so that we "fast-forward" through our non-leader periods?

That probably adds considerable complexity

indeed, it's possible. but with considerable complexity. Note that such "fast-forward"-ing needs to reload from snapshot... Just doing it without snapshot reloading would make most txes fail, invalidating the simulation itself.

I have hundred(s) of leader slots in my trace data that I'd like to simulate.

i know this is ideal. but dozen of simulation ledgers each with single round of the 4 leader slots is good enough..

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think there's some complexity in how we'd need to "fast-forward" through non-leader periods, but don't think it'd require loading from a snapshot if done correctly.

Probably would add even more complexity, but could we not treat the simmed blocks as some sort of duplicate block (or a fork).
After each 4 slot sim, we drop the simmed blocks (duplicates/fork) for the actual blocks which we then replay as normal until we get close to next leader period.

That's what I had in mind for the fast-forward, since I definitely agree that we can't just continue on from the simmed blocks and act like things will just work from there haha.
If we were to do this would probably want to collaborate with ashwin or stevecz on how we could handle the "sim-swap" to remove simmed blocks and insert real blocks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, that idea sounds nice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how we could handle the "sim-swap" to remove simmed blocks and insert real blocks.

i think this just can be done with read-write side blockstore.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, that idea sounds nice.

Cool! I think that'd really improve the usability, but we should definitely leave it for a follow-up PR, this one is big enough as is 😄

ChannelLabel::GossipVote => &gossip_vote_sender,
ChannelLabel::Dummy => unreachable!(),
};
sender.send(batches_with_stats.clone()).unwrap();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why clone instead of lettting the sender thread take ownership of these batches?

just reviewing on github, so can't see the type - is this Arc-ed, or just a clone of actual packet batch?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the useful BTreeMap::range() don't allow because it's not range_into or something like that.

However, I noticed that I can use BTreeMap::split_off(): 3a3131e

just reviewing on github, so can't see the type - is this Arc-ed, or just a clone of actual packet batch?

fyi, this is Arc-ed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I noticed that I can use BTreeMap::split_off(): 3a3131e

related to the above, i further improved simulation jitter: 5e77bd7

core/src/banking_simulation.rs Outdated Show resolved Hide resolved
}

pub struct BankingTraceEvents {
packet_batches_by_time: BTreeMap<SystemTime, (ChannelLabel, BankingPacketBatch)>,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use a VecDeque here? These are, afaik, always in order from the files (assuming we read the files in the correct order, which is easy enough to do).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comments: d3bf0d9

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, related to this a bit, I noticed we rather should stop using BTreeMap::into_iter(): 5e77bd7

@@ -504,6 +504,8 @@ impl PartialEq for Bank {
if std::ptr::eq(self, other) {
return true;
}
// Suppress rustfmt until https://github.com/rust-lang/rustfmt/issues/5920 is fixed ...
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apfitzge
Copy link

@ryoqun The figures in this comment #2733 (comment) raised a few questions for me

  1. How do these schedulers compare if we give equal number of worker theads for all impls. The unified scheduler seems to have more than double the threads, what if we also give it 4?
  2. it definitely seems the unified scheduler does a better job of parallelism due to the lack of batching, as well as its' aggressive approach towards parallelism. I'm curious how this might affect fee-collection done by the leader. Certainly we can process many non-contentious transactions, but those will use our blockspace more quickly and potentially use up blockspace that more valuable (greedy leader maximizing per-cu rewards) transactions that are currently blocked.

I think these questions are outside the scope of this PR - so maybe we do not focus on them here. Instead, I would like to ask about inspection of the simulated blocks:

Are simulated blocks saved to blockstore? Is it possible for us to save them to a "separate" blockstore or parameterize it in some way? Ideally we could run simulation for several scheduler implementations, configurations, etc, and then run some analysis on the blocks produced after the fact so that we can compare them all.

Basically it'd be really nice to add some block analysis stuff in ledger-tool, and then run that command in a bash loop to get some metrics about block "quality":

  • CU fullness
  • CU max depth
  • rewards
  • parallelism

core/Cargo.toml Outdated
@@ -104,7 +104,7 @@ solana-ledger = { workspace = true, features = ["dev-context-only-utils"] }
solana-logger = { workspace = true }
solana-poh = { workspace = true, features = ["dev-context-only-utils"] }
solana-program-runtime = { workspace = true }
solana-runtime = { workspace = true, features = ["dev-context-only-utils"] }
solana-runtime = { workspace = true }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this line completely...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done: 6d012c2


impl LikeClusterInfo for DummyClusterInfo {
fn id(&self) -> Pubkey {
self.id
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, intentionally wrap this with RwLock to mimic real ClusterInfo?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done: 535f4da

@ryoqun
Copy link
Member Author

ryoqun commented Aug 28, 2024

@ryoqun The figures in this comment #2733 (comment) raised a few questions for me

  • How do these schedulers compare if we give equal number of worker theads for all impls. The unified scheduler seems to have more than double the threads, what if we also give it 4?

yeah, i forgot to align the thread counts... I'll do in-depth comparison later. unified scheduler takes longer to clear the buffer when the thread count is very low like 4. that said, it scales well to saturate all of worker threads. here's a sample:

image

also, now that unified scheduler is enabled for block verification, i think we should increase banking thread count to like 12-16.

  • it definitely seems the unified scheduler does a better job of parallelism due to the lack of batching, as well as its' aggressive approach towards parallelism. I'm curious how this might affect fee-collection done by the leader. Certainly we can process many non-contentious transactions, but those will use our blockspace more quickly and potentially use up blockspace that more valuable (greedy leader maximizing per-cu rewards) transactions that are currently blocked.

indeed unified scheduler can't get rid of the curse of unbatched overhead, but i think it can be optimized for the greedy-leader-maximizing per-cu rewards. Currently, all non-contentious transactions are directly buffered to crossbeam channels with unbounded buffer depth in the unified scheduler as you know. but, I'm planning to place a priority queue for the freely-reorderable transactions in front of them at the scheduler thread side, while maintaining max of 1.5 * handler_thread_count of tasks are buffered by the crossbeam channels. In this way, higher-paying task reprioritization latency is about 1.5 * avg execution time of single transaction.

I think these questions are outside the scope of this PR

👍 anyway, i put some thought above.

  • so maybe we do not focus on them here. Instead, I would like to ask about inspection of the simulated blocks:

Are simulated blocks saved to blockstore?

yes.

Is it possible for us to save them to a "separate" blockstore or parameterize it in some way? Ideally we could run simulation for several scheduler implementations, configurations, etc, and then run some analysis on the blocks produced after the fact so that we can compare them all.

Basically it'd be really nice to add some block analysis stuff in ledger-tool, and then run that command in a bash loop to get some metrics about block "quality":

* CU fullness

* CU max depth

* rewards

* parallelism

yeah, we can do this easily.

@ryoqun
Copy link
Member Author

ryoqun commented Aug 28, 2024

@apfitzge thanks for all the effort of code-reviewing. sans the general clean up of banking_simulation.rs, I think i've addressed all comments. I'm planning to do the clean up tomorrow.

@ryoqun
Copy link
Member Author

ryoqun commented Sep 11, 2024

fate of 1.4k lines of pr. ;)

I was forced to rebase this pr onto #2172

Copy link

@steviez steviez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and given #2733 (review), think we can push this!

@ryoqun ryoqun dismissed apfitzge’s stale review September 11, 2024 12:30

want to merge this

@ryoqun ryoqun merged commit 34e9932 into anza-xyz:master Sep 11, 2024
52 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants