Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Light Client: Implement checkpoints in chain_spec to speed up syncing #6804

Open
Stefie opened this issue Aug 3, 2020 · 13 comments
Open

Light Client: Implement checkpoints in chain_spec to speed up syncing #6804

Stefie opened this issue Aug 3, 2020 · 13 comments
Labels
I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. J0-enhancement An additional feature request.

Comments

@Stefie
Copy link
Contributor

Stefie commented Aug 3, 2020

Context:
The way it's currently implemented, the light client needs to sync all blocks from genesis, which, depending on the number of blocks and the environment the light client is running in, can easily take more than 24h for chains with a slightly larger number of blocks.
This will make the light client basically un-useable for occasional or one-time users who don't keep a synced version of their selected chain in the IndexedDB.

Goal
Get initial sync duration of the Light Client down to 60sec.

Proposed solution:

  • Provide checkpoints snapshots in the chainspec (like we did on parity-ethereum / open-ethereum //by @andresilva )
@bkchr bkchr added I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. J0-enhancement An additional feature request. labels Aug 3, 2020
@expenses
Copy link
Contributor

expenses commented Aug 6, 2020

I'm fairly interested in the snapshotting. Distributing and fetching the snapshots seems like something we could use bitswap (#6795) for.

@expenses
Copy link
Contributor

@andresilva I could do with some mentoring for the snapshot implementation. Do you think you could sketch out a vague plan?

@andresilva
Copy link
Contributor

I don't know much about the light client implementation to be able to mentor you on this. Here's my very high-level view of what needs to be done:

  • The light client keeps track of mappings block number -> block hash in a data structure called the CHT (canonical hash trie);
  • If we store the CHT roots at a given block number in the chainspec then we could start syncing from said block (this is what I meant by snapshotting which is a bit of an overloaded word in this context);
  • The sync module needs to be able to start syncing at a predefined point.

The light client in substrate was mostly implemented by @svyatonik (@cheme might have some ideas as well as he has an open PR to fix something in the CHT). The light client in parity-ethereum was implemented by @rphmeier iirc.

@cheme
Copy link
Contributor

cheme commented Aug 11, 2020

From what I looked, it is mainly cht (need to save the root that is in CHT column and all unpruned headers (HEADER column), a mapping for accessing header is also needed (KEY_LOOKUP column) but it can easily be deduced from headers so only needed when loading. Checking client/db/src/light.rs gives a good idea.
There is also a few AUX key value to save in snapshot, the one used for grandpa (LIGHT_AUTHORITY_SET_KEY and LIGHT_CONSENSUS_CHANGES_KEY)."
There is also probably some also for babe (aux defined in client/consensus/babe/src/aux_schema.rs fwiu).
So the loading of snapshot would touch light client db, and also the consensus parts.

And next as Andre said, there is the question of skiping the sync of the previous block which if the whole state is loaded should work as long as snapshot loading is done before starting sync (will also need to save also some datas from META column : best_block , last_finalized, as used by client/db/src/light.rs). But here I am not sure what are the best way to do thing (same for building snapshot, it should be easier from a light client instance, same for loading snapshot where I would simply define it in the chain_spec but there is probably smarter way to do it). I would feel like both saving and loading snapshot should be done in a cold state (iirc eth got a command to produce it), loading can be done before loading other substrate component (except those needed to fetch the snaps).

@expenses
Copy link
Contributor

  • The sync module needs to be able to start syncing at a predefined point.

@arkpar What would be your suggestion for how to do this?

@arkpar
Copy link
Member

arkpar commented Aug 26, 2020

Sync always starts with the current best block. As @cheme mentioned, snapshot restoration should set the metadata to whatever has been restored from the snapshot before starting sync. This includes best_block, last_finalized, etc. After that it should just work.

@expenses
Copy link
Contributor

Sync always starts with the current best block. As @cheme mentioned, snapshot restoration should set the metadata to whatever has been restored from the snapshot before starting sync. This includes best_block, last_finalized, etc. After that it should just work.

What do you mean by 'etc'? Everything in sp_blockchain::Info?

@expenses
Copy link
Contributor

And where should I plug these values into?

@arkpar
Copy link
Member

arkpar commented Aug 26, 2020

Sync always starts with the current best block. As @cheme mentioned, snapshot restoration should set the metadata to whatever has been restored from the snapshot before starting sync. This includes best_block, last_finalized, etc. After that it should just work.

What do you mean by 'etc'? Everything in sp_blockchain::Info?

Everything that's written into COLUMN_META. I.e. the Meta struct. This function could be used to write it I guess:

fn update_meta(

@tomaka
Copy link
Contributor

tomaka commented Aug 27, 2020

Unless there exists a need for the light client to fetch the storage or get information about old blocks, which I don't think there is, I would suggest to completely give up with CHTs and MMRs.

As the opening post mentions, we only need two things:

  • Putting a hardcoded finalized block in the chain specs, and load it on startup into the database.
  • Warp-syncing GrandPa: GRANDPA: Warp Sync #1208

There seems to have been a lot of misunderstandings in all these discussions on Riot.
Unless I'm mistaken, none of these two require CHTs or MMRs.

@rphmeier
Copy link
Contributor

Yes, and warp-syncing GRANDPA is only needed as an improvement over the hardcoded sync. It addresses 3 issues of hardcoded sync:

  1. Not needing to update the hardcoded sync point as often
  2. Reducing trust in the developers of the client or the distributor of the chain-spec
  3. Faster getting to the head of the chain

@Stefie Stefie changed the title Light Client: Implement snapshots in chain_spec and skipping sync to speed up syncing Light Client: Implement snapshots in chain_spec to speed up syncing Aug 28, 2020
@expenses
Copy link
Contributor

I've gotten the header to load from the chain spec okay, but I'm getting an error from BABE and it's not syncing:
2020-08-31 13:08:54.451 tokio-runtime-worker WARN sc_network::protocol::sync 💔 Verification failed for block 0x9fd0df299969154253afc756320e1ef8bd4ca8a169389b604850024d7ae6ebe2 received from peer: 12D3KooWLK2gMLhWsYJzjW3q35zAs9FDDVqfqVfVuskiGZGRSMvR, "Could not fetch epoch at 0xf818a6113e3b157c34048ad9367f0bd33c42ab5da81eb3ff17b0a28767415571"

I believe this could be to do with the keys that @cheme mentions above: #6804 (comment).

@expenses
Copy link
Contributor

expenses commented Sep 7, 2020

I've gotten the header to load from the chain spec okay, but I'm getting an error from BABE and it's not syncing:
2020-08-31 13:08:54.451 tokio-runtime-worker WARN sc_network::protocol::sync 💔 Verification failed for block 0x9fd0df299969154253afc756320e1ef8bd4ca8a169389b604850024d7ae6ebe2 received from peer: 12D3KooWLK2gMLhWsYJzjW3q35zAs9FDDVqfqVfVuskiGZGRSMvR, "Could not fetch epoch at 0xf818a6113e3b157c34048ad9367f0bd33c42ab5da81eb3ff17b0a28767415571"

I believe this could be to do with the keys that @cheme mentions above: #6804 (comment).

Ok, so this is failing because epoch_data_for_child_of:

/// Finds the epoch for a child of the given block, assuming the given slot number.
///
/// If the returned epoch is an `UnimportedGenesis` epoch, it should be imported into the
/// tree.
pub fn epoch_descriptor_for_child_of<D: IsDescendentOfBuilder<Hash>>(
&self,
descendent_of_builder: D,
parent_hash: &Hash,
parent_number: Number,
slot_number: E::SlotNumber,
) -> Result<Option<ViableEpochDescriptor<Hash, Number, E>>, fork_tree::Error<D::Error>> {
is returning Ok(None). I don't know this code at all well, but I imagine that the solution is to store some sort of epoch information in the sync state. You can checkout the branch at ashley-load-sync-state.

@Stefie Stefie changed the title Light Client: Implement snapshots in chain_spec to speed up syncing Light Client: Implement checkpoints ~~snapshots~~ in chain_spec to speed up syncing Oct 6, 2020
@Stefie Stefie changed the title Light Client: Implement checkpoints ~~snapshots~~ in chain_spec to speed up syncing Light Client: Implement checkpoints in chain_spec to speed up syncing Oct 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. J0-enhancement An additional feature request.
Projects
None yet
Development

No branches or pull requests

8 participants