Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Persistence][Core][Savepoints/Rollbacks] Implement KISS SavePoints - Serialize WorldState from Persistence #327

Open
6 tasks
Tracked by #562
jessicadaugherty opened this issue Oct 31, 2022 · 7 comments
Assignees
Labels
consensus Consensus specific changes core Core infrastructure - protocol related persistence Persistence specific changes testing Defining, adding, automating or modifying tests utility Utility specific changes

Comments

@jessicadaugherty
Copy link
Contributor

jessicadaugherty commented Oct 31, 2022

Objective

Revisit the TODOs used as placeholders throughout the persistence module codebase related to save points.

Make sure that the SavePoint is created when the UtilityContext (soon to be renamed to UtilityUnitOfWork) is instantiated.

NOTE: This ticket is dependent on #563

Origin Document

Goals

Deliverable

  • A PR with implementation tending to the goal above
  • Deterministic unit tests to verify the logic above
  • Fuzzed tests with some chaos to stress test this business logic

Non-goals / Non-deliverables

  • An end-to-end testing/automation framework
  • Implementation of Save Points in other V1 modules
  • Implementation of Roll Backs as part of the Save Points

General issue deliverables

  • Update the appropriate CHANGELOG
  • Update any relevant READMEs (local and/or global)
  • Update any relevant global documentation & references

Testing Methodology

  • All tests: make test_all
  • LocalNet: verify a LocalNet is still functioning correctly by following the instructions at docs/development/README.md

Creator: @jessicadaugherty - rescope: @deblasis
Co-creator: @Olshansk

@Olshansk Olshansk changed the title [Utility] Implement Rollback Logic on Errors [Core] Implement Rollback Logic on Errors Oct 31, 2022
@Olshansk Olshansk changed the title [Core] Implement Rollback Logic on Errors [Core] Implement Save Points & Rollbacks Oct 31, 2022
@Olshansk Olshansk added core Core infrastructure - protocol related consensus Consensus specific changes utility Utility specific changes persistence Persistence specific changes testing Defining, adding, automating or modifying tests labels Oct 31, 2022
@Olshansk
Copy link
Member

@jessicadaugherty Updated the description. Please review

@jessicadaugherty jessicadaugherty changed the title [Core] Implement Save Points & Rollbacks [Persistence][Core] Implement Save Points & Rollbacks Dec 22, 2022
@jessicadaugherty jessicadaugherty changed the title [Persistence][Core] Implement Save Points & Rollbacks [Persistence][Core] Implement Save Points Jan 10, 2023
@h5law
Copy link
Contributor

h5law commented Jan 18, 2023

@jessicadaugherty Is it worth looking into using pgx's builtin subtransaction savepoints and rollbacks? As far as I know, the tx.Begin() command will automatically create a savepoint and then you can call tx.Rollback() at any point before tx.Commit() to revert to the most recent savepoint.

See: pgx rollback implementation
And: pgx Tx docs

This way we could have NewSavepoint(pgx.Tx) (pgx.Tx, error) and use this functionality across the board for subtransaction rollbacks calling RollbackToSavepoint(pgx.Tx) error

I am not super clear on how another form of rollbacks could be implemented as Postgres doesn't support them outside of transactions (I believe). Unless it were to be a DB incremental backup and restore scenario?

@deblasis
Copy link
Contributor

Hi @h5law ! I have received a notification since the issue is assigned to me so I'll try to answer your question on @jessicadaugherty's behalf :)

You might be on the right path on the Postgres side of things but I think it's worth clarifying that savepoints and rollbacks have to also consider blockStore, txIndexer and stateTrees:

type persistenceModule struct {
	bus          modules.Bus
	config       *configs.PersistenceConfig
	genesisState *genesis.GenesisState

	blockStore kvstore.KVStore
	txIndexer  indexer.TxIndexer
	stateTrees *stateTrees

	// TECHDEBT: Need to implement context pooling (for writes), timeouts (for read & writes), etc...
	writeContext *PostgresContext // only one write context is allowed at a time
}

Essentially, we need to implement whatever is necessary so that everything we persist can have Savepoints and Rollback functionality. The whole thing has to work in unison across all the datastores.

Any ideas or pointers from your point of view are more than welcome.

Please let me know if this makes sense, happy to answer any further questions you might have.

@h5law
Copy link
Contributor

h5law commented Jan 18, 2023

@deblasis

Any ideas or pointers from your point of view are more than welcome.

My knowledge of SMTs is very very limited but from a little reading

func (s *SMT) Revert(toOldRoot []byte) error {

"When Revert is called, the trees to rollback (between the current tree and toOldRoot) are deleted from the database. This process can be slow so for a quick rollback, manually set the Root to the state you want to revert to and reload the cache."

Resetting the root seems to be a fast implementation.

BlockStore and txIndexer are both using kvstore.KVStore for their internal db, again I will keep looking into it as I am not super familiar with this yet but it seems the savepoint/rollback functionality extends as far as the KVStore and SparseMerkleTree types as well as PostgreSQL.

If this is correct I'll try to find out more on each of these and see what comes up.

@Olshansk
Copy link
Member

Olshansk commented Jan 25, 2023

To add to what @deblasis said, you're definitely on the write part w.r.t the Postgres piece.

In shared/modules/persistence_module.go, the first few functions of PersistenceWriteContext are:

type PersistenceWriteContext interface {
	// Context Operations
	NewSavePoint([]byte) error
	RollbackToSavePoint([]byte) error
	Release() error

	// Commits the current context (height, hash, transactions, etc...) to disk (i.e. finality).
	Commit(proposerAddr, quorumCert []byte) error
	
	// ...
}

And the last few lines current implementation of Commit are:

func (p PostgresContext) Commit(proposerAddr, quorumCert []byte) error {
    //...

	// Commit the SQL transaction
	ctx := context.TODO()
	if err := p.getTx().Commit(ctx); err != nil {
		return err
	}
	if err := p.conn.Close(ctx); err != nil {
		log.Println("[TODO][ERROR] Implement connection pooling. Error when closing DB connecting...", err)
	}

	return nil
}

There's still some work to make the postgres part work correctly (if you navigate the code you'll understand what I mean), but most of the piece are setup to enable that. It's "easy" because we can leverage rollbacks in a SQL db, which we use for state storage.

For state commitment (i.e. computing the hash), the hard part is that we have to make the change (i.e. update the key-value stores) after validating the transactions, and verify that the root is what we expect it to be.

In essence, here's what we need to do (pseudo-code):

for tx in transactions:
   // Start a postgres transaction
   // Create "ephemeral" version of our key-value store (which are the one's backing our trees)
   if tx is valid:
      pgx.Tx.Apply(stateChange)
      tree.Apply(stateChange) // update key value || delete existing key || insert new key
   else: // exception, error, invalid tx, etc...
      pgx.Tx.rollback()
      tree.Revert() // revert updated keys to old value || re-insert deleted keys || delete new keys

if new_state_hash != proposed_state_hash:
    // same rollback logic as above            

The hardest part of what I described above is // revert updated keys to old value || re-insert deleted keys || delete new keys because it either requires one of the following:

  1. looking for functionality in the KV stores that enables this OR
  2. Maintaining some sort of in-memory cache that allows rolling these OR
  3. Having logic that loops through the transactions in the block being reverted and determines the proper logic to revert

I haven't thought through the details here deeply so the above is just where I would start thinking/looking if I were to implement it myself. There are still some open questions and we'll need to properly design this piece.

@jessicadaugherty
Copy link
Contributor Author

Likely need to split into multiple issues:

  • storage in postgres and badger.db - review separately to avoid huge PR
  • common entrypoint for save points and rollbacks

@jessicadaugherty
Copy link
Contributor Author

@deblasis @Olshansk please rescope this ticket based on outcome of review of #493

@deblasis deblasis changed the title [Persistence][Core] Implement Save Points [Persistence][Core] Implement KISS SavePoints - Serialize WorldState from Persistence Mar 8, 2023
@jessicadaugherty jessicadaugherty changed the title [Persistence][Core] Implement KISS SavePoints - Serialize WorldState from Persistence [Persistence][Core][Savepoints/Rollbacks] Implement KISS SavePoints - Serialize WorldState from Persistence Mar 8, 2023
deblasis added a commit that referenced this issue Apr 6, 2023
…x for params and flags (#654)

## Description

While working on #327 (KV Store PR) I realized that the scope was
growing and it sounded sensible to create a specific PR with these fixes
that I believe are gonna be needed regardless.

## Issue

I didn't bother creating an issue since I guess this constitutes
bug/techdebt even if it's necessary work for #327 if we move forward
with the KV store approach.

## Type of change

Please mark the relevant option(s):

- [ ] New feature, functionality or library
- [x] Bug fix
- [x] Code health or cleanup
- [ ] Major breaking change
- [ ] Documentation
- [ ] Other <!-- add details here if it a different type of change -->

## List of changes

- Fixed flag/params keying. It was using the hash of the value and not
the name, making queries by name impossible

![image](https://user-images.githubusercontent.com/29378614/230366398-6f56cce2-6183-47b0-a259-6c05f17890c7.png)
- Updated pools read/write functions to use a real address, not a string
that is named address and obviously is not hexadecimal etc, etc.
- Added test to describe behaviour and catch future bugs (we already
missed a pool that wasn't added to the FriendlyNames:
`Pools_POOLS_FISHERMAN_STAKE`)
- Updated runtime and tests to reflect the changes

## Testing

- [x] `make develop_test`; if any code changes were made
- [x] [Docker Compose
LocalNet](https://github.com/pokt-network/pocket/blob/main/docs/development/README.md);
if any major functionality was changed or introduced
- [x] [k8s
LocalNet](https://github.com/pokt-network/pocket/blob/main/build/localnet/README.md);
if any infrastructure or configuration changes were made

<!-- REMOVE this comment block after following the instructions
 If you added additional tests or infrastructure, describe it here.
 Bonus points for images and videos or gifs.
-->

## Required Checklist

- [x] I have performed a self-review of my own code
- [x] I have commented my code, particularly in hard-to-understand areas
- [x] I have added, or updated, [`godoc` format
comments](https://go.dev/blog/godoc) on touched members (see:
[tip.golang.org/doc/comment](https://tip.golang.org/doc/comment))
- [x] I have tested my changes using the available tooling
- [x] I have updated the corresponding CHANGELOG

### If Applicable Checklist

- [ ] I have updated the corresponding README(s); local and/or global
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added, or updated,
[mermaid.js](https://mermaid-js.github.io) diagrams in the corresponding
README(s)
- [ ] I have added, or updated, documentation and
[mermaid.js](https://mermaid-js.github.io) diagrams in `shared/docs/*`
if I updated `shared/*`README(s)

---------

Signed-off-by: Alessandro De Blasis <alex@deblasis.net>
Co-authored-by: Daniel Olshansky <olshansky@pokt.network>
@Olshansk Olshansk assigned dylanlott and unassigned deblasis Apr 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consensus Consensus specific changes core Core infrastructure - protocol related persistence Persistence specific changes testing Defining, adding, automating or modifying tests utility Utility specific changes
Projects
Status: Backlog
5 participants