Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Specification for Transactional Storage as Default in FRAME #10806

Closed
2 of 3 tasks
shawntabrizi opened this issue Feb 7, 2022 · 10 comments · Fixed by #11431
Closed
2 of 3 tasks

Specification for Transactional Storage as Default in FRAME #10806

shawntabrizi opened this issue Feb 7, 2022 · 10 comments · Fixed by #11431
Labels
J1-meta A specific issue for grouping tasks or bugs of a specific category.

Comments

@shawntabrizi
Copy link
Member

shawntabrizi commented Feb 7, 2022

This issue will outline how I believe we should make transactional storage production usable within FRAME, and even a default part of all FRAME extrinsics.

Steps

Background

At a very high level, storage in the runtime has two main abstractions:

  1. An underlying DB, which is normally written to a hard-disk.
  2. An in-memory overlay which is used to keep track of read/written keys until the end of a block where those values are then committed to the DB.

When writing function within the runtime, making modifications to storage will affect the values in the in-memory overlay. These changes are pooled together with any other changes that happened in previous and future transactions during the block building process, and thus if storage is modified during an extrinsic, it is not possible to undo those specific changes.

This has lead to a critical "best practice" when doing runtime development: "verify first, write last", which says that you should never return an Err from an extrinsic after you have modified storage, as those changes will persist, and you may end up with an inconsistent state.

An easy example of this can be seen by executing the following:

#[test]
fn storage_modified_then_error() {
	fn set_value_then_error(v: u32) -> DispatchResult {
		Value::set(v);
		Err("never do this")?;
		Ok(())
	}

	TestExternalities::default().execute_with(|| {
		// `assert_noop` guarantees that the state root is unchanged after some function is called
		// and an error is returned.
		// This assertion will fail, because storage was already modified before the error was
		// returned.
		assert_noop!(set_value_then_error(3), "never do this");
	});
}

What is transactional?

Transactional is a feature that was implemented specifically to address this problem. The original PR can be found here: #6269.

Basically, the low level storage APIs now provide a way to spawn additional in-memory overlays and choose whether you want to commit those changes to the main storage overlay or not. When used properly, this can allow you to isolate changes which came about due to a specific extrinsic, and at any point, choose not to commit them.

#[test]
fn storage_modified_then_error_with_transactional() {
	#[transactional] // <-- This flag fixes our bug
	fn set_value_then_error_transactional(v: u32) -> DispatchResult {
		Value::set(v);
		Err("this is totally okay")?;
		Ok(())
	}

	TestExternalities::default().execute_with(|| {
		// This now succeeds!
		assert_noop!(set_value_then_error_transactional(3), "this is totally okay");
	});
}

Additionally, transactional functions can be nested, such that each time a new transactional layer is created, you can choose whether you want to commit those changes to the transactional layer below you. This means you can have storage modifying functions nested within other storage modifying functions, and have pretty much full control over what you do and do not want to commit to the final database.

Problems with the current system

The current transactional system does not take into account limitations of the runtime in terms of computational or memory usage, thus it is not really "safe by default" to use in the runtime. A user can nest transactional layers as much as they want, and there really is no integration of this functionality specifically for benchmarking worst case scenarios.

Computational Overhead

There is a non-zero cost to resolving a transactional layer into the overlay below it.

We don't really need to copy. All values are stored in heap and we just move pointers.
So the overhead does not depend on the size of the storage items but only on the count of items.

Assuming multiple nested layers, then that layer would need to be copied to the layer below, and so on.

The overhead is very low relative to other kinds of operations within the runtime, but still, at a high enough nesting level, is non-negligible.

Memory Overhead

From @athei, all of the transactional layers use client memory, not Wasm, so there should be no practical resource limitations here.

What we want

The end goal for FRAME is to make it as easy as possible to write Pallets which are correct by default. The chance that users can make a mistake of committing changes to storage, returning an error, and then expecting nothing to change is very high. This is especially true when we note that most Pallet developers are probably use to writing code exactly in this way from Smart Contract development on platforms like Ethereum.

As such, FRAME wants to take advantage of transactional as both usable and potentially even a default part of the Pallet development experience.

To do that, we must address some of the problems with the existing system.

Proposed Solutions

To make this feature production ready, we need to address a couple of different problems.

Hard limit to nesting transactional layers

As mentioned above, there is currently no limits in place for nesting transactional layers, however we know that there is a non-zero resource impact when doing this. Even within software development, there are stack limits which when reached, lead to stack-overflows.

I propose we introduce a conservative hard limit of 10 nested transactional layers as a default in FRAME. Potentially we could allow users to override or bypass this limit through lower level functions, but when simply using the transactional feature, this limit should be enforced.

When the limit is reached, trying to spawn a new transactional layer will return an Err, and can be handled by developer.

Default single transaction layer per extrinsic

The overhead of a single transactional layer should be negligible for nearly all runtime functions, and the benefits in terms of developer experience are huge.

I propose that all extrinsics by default spawn a single transactional storage layer.

If that extrinsic returns Err, we can expect that all modified storage items will be reverted, and the underlying state root will be unaffected. However, if that function returns Ok, we will instead commit any changes which are present in this single transactional level, just like developers would expect from Smart Contract platforms.

By default, dispatching other calls within a call would not lead to generating more transactional layers.

Dispatch with transactional layer

There are cases where users may want to dispatch another call within its own transactional layer, but within the limits defined above.

In that case, the user can call a specific dispatch_with_transactional(call) which will explicitly spawn a new transactional layer and then execute the call, allowing the user to handle the result.

Currently the #[tranasctional] tag is placed above different function definitions, but this does not really make sense if all extrinsics spawn at least one transactional layer by default. Instead, it should be the person writing the dispatch function to determine if a function they are calling should be called within an additional transactional context.

Annotation for safe without storage layer

Now that the default behavior of extrinsics will be to spawn at least one transactional layer, we can introduce an opt-in optimization where a user can state that a function is safe to be executed without its own transactional layer.

For example:

/// This function is safe to execute without an additional transactional storage layer.
#[without_transactional]
fn set_value(x: u32) -> DispatchResult {
    Self::check_value(x)?;
    MyStorage::set(x);
    Ok(())
}

When this function is called directly, a transactional layer should not spawn for it.

If a user called dispatch_with_transactional to this function, a transactional layer also does not need to spawn.

Name Change

As a final part to this specification, I propose we drop the name transactional for something more clear to what is happening here. The term transaction is already confusing within the Substrate ecosystem, especially in context with extrinsics. Also, it is not clear that the behavior here really has anything to do with transactions.

Instead I propose we call these apis storage_layers, and basically replace all uses of transnational with some appropriate use of that term.

For example:

  • dispatch_with_storage_layer
  • #[without_storage_layer]
  • get_storage_layer() -> u8
  • add_storage_layer() -> Result
  • etc...
@shawntabrizi shawntabrizi added the J1-meta A specific issue for grouping tasks or bugs of a specific category. label Feb 7, 2022
@athei
Copy link
Member

athei commented Feb 7, 2022

This all makes sense to me:

  • Name change: Transactional is a good name when you are a database but it is too overloaded in our case. Name change is warranted. I agree.
  • Making transactional opt-out: I like that we revert on Err by default. But is the performance gain really worth the hassle of allowing an opt-out? This will increase review time for code that does this. We need rules when you should opt-out then. I lean towards "not worth". Or we should make clear that this really is the exception.
  • Hard limit: Yeah makes sense. Question is how to implement that. Maybe we can add a new fallible version of start_transaction which returns an Err when the limit is reached? This would require a client update, though.

@shawntabrizi
Copy link
Member Author

Now that transactional layers are tracked, we should make it so that a new transactional layer is not spawned by default, but instead, only spawned if a transactional layer does not exist.

A user can instead explicitly spawn a new nested transactional layer if they want.

Something like this? https://github.com/paritytech/substrate/compare/shawntabrizi-transactional-nesting?expand=1

@athei
Copy link
Member

athei commented Apr 4, 2022

But this will change the behavior. When annotating #[transactional] you would expect that only the changes done in this extrinsic will be rolled back. If you are part of a batch you will also roll back the changes made by the other extrinsics of the batch?

@shawntabrizi
Copy link
Member Author

The main thing we don't want is that when dispatching a call from one extrinsic, we don't go into another layer unless the user wants that.

so call.dispatch(params) doesnt spawn a new layer, but call.dispatch_with_transactional(params) will.

This will mean that all things like batch will be safe from any limits since it will never spawn more than 1 layer.

Also, some functions may be marked as require_transactional, but does not actually want to spawn a new layer. In this case we spawn a new layer if none exist, or we just use the existing layer.

@athei
Copy link
Member

athei commented Apr 5, 2022

Okay I misunderstood your changes. So #[transactional] will always spawn a layer but #[with_transaction] only on demand. I have a hard time imagining some code which wants to use the latter one. I can't find a single user of #[require_transaction] within substrate. Do you have an example?

@xlc
Copy link
Contributor

xlc commented Apr 5, 2022

I proposed & implemented require_transactional initially because we (Acala) need it. You can find some usages at Acala & ORML.

It should be used to annotate all the methods that could fail after storage write.

https://github.com/open-web3-stack/open-runtime-module-library/blob/c439a50e01944aedeef33231e0824a17ed1813bc/xtokens/src/lib.rs#L904

https://github.com/AcalaNetwork/Acala/blob/f4b80d7200c19b78d3777e8a4a87bc6893740d23/modules/nft/src/lib.rs#L387

@bkchr
Copy link
Member

bkchr commented Apr 5, 2022

so call.dispatch(params) doesnt spawn a new layer, but call.dispatch_with_transactional(params) will.

Please don't let us do this. If you want a new layer, just run with with_transaction(|| call.dispatch(params)).

layer but #[with_transaction] only on demand

But then we don't need require_transactional anymore? I mean both are expressing the same, but just a little bit differently?

@athei
Copy link
Member

athei commented Apr 5, 2022

But then we don't need require_transactional anymore? I mean both are expressing the same, but just a little bit differently?

Yes. In Shawn's proposal require_transactional was renamed to with_transaction.

@athei
Copy link
Member

athei commented Apr 5, 2022

Please don't let us do this. If you want a new layer, just run with with_transaction(|| call.dispatch(params)).

Why? It seems like a harmless convenience function to me.

@bkchr
Copy link
Member

bkchr commented Apr 5, 2022

Okay, I just checked the code and dispatch_bypass_filter is already in some kind of extension trait. If we add dispatch_with_transactional into another extension trait, I'm fine with it. I just had the fear that we add this all to the same trait, which was already not true from the beginning.

CertainLach added a commit to UniqueNetwork/unique-chain that referenced this issue Jul 21, 2022
Every extrinsic now runs in transaction implicitly, and
`#[transactional]` on pallet dispatchable is now meaningless

Upstream-Change: paritytech/substrate#10806
Signed-off-by: Yaroslav Bolyukin <iam@lach.pw>
mrshiposha pushed a commit to UniqueNetwork/unique-chain that referenced this issue Aug 15, 2022
Every extrinsic now runs in transaction implicitly, and
`#[transactional]` on pallet dispatchable is now meaningless

Upstream-Change: paritytech/substrate#10806
Signed-off-by: Yaroslav Bolyukin <iam@lach.pw>
mrshiposha pushed a commit to UniqueNetwork/unique-chain that referenced this issue Aug 16, 2022
Every extrinsic now runs in transaction implicitly, and
`#[transactional]` on pallet dispatchable is now meaningless

Upstream-Change: paritytech/substrate#10806
Signed-off-by: Yaroslav Bolyukin <iam@lach.pw>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
J1-meta A specific issue for grouping tasks or bugs of a specific category.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants