Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filecoin storage capacity modeling - Miner cron analysis #11034

Closed
5 tasks
ZenGround0 opened this issue Jun 30, 2023 · 6 comments
Closed
5 tasks

Filecoin storage capacity modeling - Miner cron analysis #11034

ZenGround0 opened this issue Jun 30, 2023 · 6 comments
Assignees

Comments

@ZenGround0
Copy link
Contributor

ZenGround0 commented Jun 30, 2023

User story

Long term stewards of the filecoin network need to understand how chain bandwidth and block validation times degrade at high storage capacity: see background and action item here. This will help the community make informed decisions about

  • when we expect bad network conditions to show up and predict what they will look like
  • how to fix degrading network conditions when they do show up
  • how to design future changes without making things worse

Acceptance criteria

A simple model of how much storage the filecoin network can handle and how much cron times / gas degrades at different storage levels regarding sectors and partitions.

This model will offer answers to questions like:

  1. How much storage can possibly be onboarded to filecoin?

  2. How much miner cron gas/time do we expect to see if the network grows to 25 EiB of RBP?

    • This will help us further clarify the problem statement and prioritize FIP: Safe Cron, an initiative lotus team is interested in

Deliverables

Technical breakdown

  1. Gather information about existing gas, try to do simple regressions against sector size / partition size to match against gas values for PoRep/ConfirmSectorProofsValid, PoSt, HandleProvingDeadline cron. Make the model more complicated if needed (existence of deals, other state inefficiencies that matter)
  2. Apply this information to some basic models of different pieces of filecoin:

Tasks

Leaving out for now

questions to answer

  1. At what network storage capacity is the entire chain bandwidth consumed by SP WindowPoSts?
    • This will be useful for prioritizing FIPs like Gas Lane
  2. How long would the miner cron be if half the network faulted their storage today?

work to get there

  • A simple model of PoSt gas: per sector / per partition
  • A simple model composing all of this together in observable
  • A simple model of worst case faulting / terminating impact on cron gas
  • *A simple model of recover/dispute PoSt gas: per sector / partition
@ZenGround0
Copy link
Contributor Author

ZenGround0 commented Jun 30, 2023

TODO under question give me the so what i.e. we have clarity on XXX FIP, we know when and how to pull trigger on certain work
Done

@ZenGround0 ZenGround0 self-assigned this Jul 3, 2023
@jennijuju jennijuju changed the title Miner cron analysis and filecoin capacity modeling Filecoin storage capacity modeling - Miner cron analysis Jul 3, 2023
@jennijuju
Copy link
Member

jennijuju commented Jul 3, 2023

todo: @ZenGround0 add termination fip discussion link
Done

@ZenGround0
Copy link
Contributor Author

ZenGround0 commented Jul 13, 2023

Update

As I got pulled into a few other things this work is only partially complete. However we've already found some surprises progress.

Results

Tools

Developed some commands for interpreting gas trace summarized here
Introduced this shed tool to help: #11078

Analysis

  1. Learned that partition proving is not the dominant cost as originally assumed. Overhead is significant: ~1/2 of the gas / time costs of all of miner proving deadline cron comes from miners with 0 active partitions in their deadline. Counting overhead of actors with partitions overhead is > 1/2 of resource usage.
  2. The number of cron jobs where no partitions have been assigned is ~1/2. This indicates that miners use on average 1/2 of their deadlines. This is more dense than I had previously thought.
  3. Terminations do have a significant cost (3x ordinary gas in one instance)

Current Direction

I'm still pursuing modeling per partition cost of miner cron + average porep onboarding costs. Two new directions are obvious in light of latest findings.

  1. Biggest new insight is that overhead is significant. I've also noticed some variance: some 0 partition jobs take 2x the time as others. Next thing Im going to do is model gas "overhead". Current biggest culprits for gas used are 1) miner vesting 2) precommit deposit burning 3) worker key changes. Getting a clear picture on this will help me develop some ideas for cutting down this cost entirely. A new output is a protocol change proposal motivated by understanding of gas costs.
  2. In order to give feedback to: Feature: move partition  builtin-actors#1326 we need to understand worst case costs and its clear that this involves terminations and faults. Modeling terminations and faults per sector/partition is required for the output of providing guidance on this protocol change in addition to porep / carrying capacity estimations.

More fine grained technical breakdown

It is now clearer what to do for modeling work:

Miner cron work

  1. Automate gathering of a full proving period worth of data
  2. Investigate code / gas traces and start to find patterns in 0 partition (pure overhead) case
  3. Search out faults and expirations in data to model worse case costs

Carrying capacity work

  1. Create similar jq parsing command to measure ProveCommitSector and power cron gas
  2. Take data over a large region of the chain

After finishing modeling I'll move onto communicating (i.e. observable visualizations, FIP discussions, actors PR review)

@ZenGround0
Copy link
Contributor Author

ZenGround0 commented Jul 27, 2023

For time constraints I ended up stopping at miner cron analysis. Ive filed a follow up issues on prove commit and post for the backlock here: #11105

Results published here: filecoin-project/FIPs#761
lotus-shed PR used in analysis: #11078
Repo containing code to run analysis: https://github.com/FILCAT/gas-model

Contributing back to FIP discussion on moving partitions: filecoin-project/FIPs#735 (comment)
Some cron cost reduction measures: filecoin-project/FIPs#771

@jennijuju
Copy link
Member

@ZenGround0 could you please also document the tooling you have built for this and how one can run them?

@ZenGround0
Copy link
Contributor Author

Its in the TODOs above ^^ ive been working on publishing slowly but steadily. I'll be updating the comment as I go. 🤞 its done by tomrrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants