Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Commit

Permalink
Executor Environment parameterization (#6161)
Browse files Browse the repository at this point in the history
* Re-apply changes without Diener, rebase to the lastest master

* Cache pruning

* Bit-pack InstantiationStrategy

* Move ExecutorParams version inside the structure itself

* Rework runtime API and executor parameters storage

* Pass executor parameters through backing subsystem

* Update Cargo.lock

* Introduce `ExecutorParams` to approval voting subsys

* Introduce `ExecutorParams` to dispute coordinator

* `cargo fmt`

* Simplify requests from backing subsys

* Fix tests

* Replace manual config cloning with `.clone()`

* Move constants to module

* Parametrize executor performing PVF pre-check

* Fix Malus

* Fix test runtime

* Introduce session executor params as a constant defined by session info
pallet

* Use Parity SCALE codec instead of hand-crafted binary encoding

* Get rid of constants; Add docs

* Get rid of constants

* Minor typo

* Fix Malus after rebase

* `cargo fmt`

* Use transparent SCALE encoding instead of explicit

* Clean up

* Get rid of relay parent to session index mapping

* Join environment type and version in a single enum element

* Use default execution parameters if running an old runtime

* `unwrap()` -> `expect()`

* Correct API version

* Constants are back in town

* Use constants for execution environment types

* Artifact separation, first try

* Get rid of explicit version

* PVF execution queue worker separation

* Worker handshake

* Global renaming

* Minor fixes resolving discussions

* Two-stage requesting of executor params to make use of runtime API cache

* Proper error handling in pvf-checker

* Executor params storage bootstrapping

* Propagate migration to v3 network runtimes

* Fix storage versioning

* Ensure `ExecutorParams` serialization determinism; Add comments

* Rename constants to make things a bit more deterministic
Get rid of stale code

* Tidy up a structure of active PVFs

* Minor formatting

* Fix comment

* Add try-runtime hooks

* Add storage version write on upgrade

Co-authored-by: Andronik <write@reusable.software>

* Add pre- and post-upgrade assertions

* Require to specify environment type; Remove redundant `impl`s

* Add `ExecutorParamHash` creation from `H256`

* Fix candidate validation subsys tests

* Return splittable error from executor params request fn

* Revert "Return splittable error from executor params request fn"

This reverts commit a85038d.

* Decompose approval voting metrics

* Use more relevant errors

* Minor formatting fix

* Assert a valid environment type instead of checking

* Fix `try-runtime` hooks

* After-merge fixes

* Add migration logs

* Remove dead code

* Fix tests

* Fix tests

* Back to the strongly typed implementation

* Promote strong types to executor interface

* Remove stale comment

* Move executor params to `SessionInfo`: primitives and runtime

* Move executor params to `SessionInfo`: node

* Try to bump primitives and API version

* Get rid of `MallocSizeOf`

* Bump target API version to v4

* Make use of session index already in place

* Back to v3

* Fix all the tests

* Add migrations to all the runtimes

* Make use of existing `SessionInfo` in approval voting subsys

* Rename `TARGET` -> `LOG_TARGET`

* Bump all the primitives to v3

* Fix Rococo ParachainHost API version

* Use `RollingSessionWindow` to acquire `ExecutorParams` in disputes

* Fix nits from discussions; add comments

* Re-evaluate queue logic

* Rework job assignment in execution queue

* Add documentation

* Use `RuntimeInfo` to obtain `SessionInfo` (with blackjack and caching)

* Couple `Pvf` with `ExecutorParams` wherever possible

* Put members of `PvfWithExecutorParams` under `Arc` for cheap cloning

* Fix comment

* Fix CI tests

* Fix clippy warnings

* Address nits from discussions

* Add a placeholder for raw data

* Fix non exhaustive match

* Remove redundant reexports and fix imports

* Keep only necessary semantic features, as discussed

* Rework `RuntimeInfo` to support mock implementation for tests

* Remove unneeded bound

* `cargo fmt`

* Revert "Remove unneeded bound"

This reverts commit 3bdfc68.

* Fix PVF host tests

* Fix PVF checker tests

* Fix overseer declarations

* Simplify tests

* `MAX_KEEP_WAITING` timeout based on `BACKGING_EXECUTION_TIMEOUT`

* Add a unit test for varying executor parameters

* Minor fixes from discussions

* Add prechecking max. memory parameter (see paritytech-secops/srlabs_findings#110)

* Fix and improve a test

* Remove `ExecutionEnvironment` and `RawData`

* New primitives versioning in parachain host API

* `disputes()` implementation for Kusama and Polkadot

* Move `ExecutorParams` from `vstaging` to stable primitives

* Move disputes from `vstaging` to stable implementation

* Fix `try-runtime`

* Fixes after merge

* Move `ExecutorParams` to the bottom of `SessionInfo`

* Revert "Move executor params to `SessionInfo`: primitives and runtime"

This reverts commit 1988355.

* Always use fresh activated live hash in pvf precheck
(re-apply 029b82b)

* Fixing tests (broken commit)

* Fix candidate validation tests

* Fix PVF host test

* Minor fixes

* Address discussions

* Restore migration

* Fix `use` to only include what is needed instead of `*`

* Add comment to never touch `DEFAULT_CONFIG`

* Update migration to set default `ExecutorParams` for `dispute_period`
sessions back

* Use `earliest_stored_session` instead of calculations

* Nit

* Add logs

* Treat any runtime error as `NotSupported` again

* Always return default executor params if not available

* Revert "Always return default executor params if not available"

This reverts commit b58ac44.

* Add paritytech/substrate#9997 workaround

* `cargo fmt`

* Remove migration (again!)

* Bump executor params to API v4 (backport from #6698)

---------

Co-authored-by: Andronik <write@reusable.software>
  • Loading branch information
s0me0ne-unkn0wn and ordian authored Feb 15, 2023
1 parent 10daabc commit 9a17c4e
Show file tree
Hide file tree
Showing 40 changed files with 1,243 additions and 330 deletions.
3 changes: 3 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion erasure-coding/benches/scaling_with_validators.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
// along with Polkadot. If not, see <http://www.gnu.org/licenses/>.

use criterion::{criterion_group, criterion_main, BenchmarkId, Criterion, Throughput};
use polkadot_primitives::v2::Hash;
use polkadot_primitives::Hash;
use std::time::Duration;

fn chunks(n_validators: usize, pov: &Vec<u8>) -> Vec<Vec<u8>> {
Expand Down
2 changes: 1 addition & 1 deletion node/core/candidate-validation/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ polkadot-primitives = { path = "../../../primitives" }
polkadot-parachain = { path = "../../../parachain" }
polkadot-node-primitives = { path = "../../primitives" }
polkadot-node-subsystem = { path = "../../subsystem" }
polkadot-node-subsystem-util = { path = "../../subsystem-util" }
polkadot-node-metrics = { path = "../../metrics" }

[target.'cfg(not(any(target_os = "android", target_os = "unknown")))'.dependencies]
Expand All @@ -27,6 +28,5 @@ sp-keyring = { git = "https://github.com/paritytech/substrate", branch = "master
futures = { version = "0.3.21", features = ["thread-pool"] }
assert_matches = "1.4.0"
polkadot-node-subsystem-test-helpers = { path = "../../subsystem-test-helpers" }
polkadot-node-subsystem-util = { path = "../../subsystem-util" }
sp-core = { git = "https://github.com/paritytech/substrate", branch = "master" }
test-helpers = { package = "polkadot-primitives-test-helpers", path = "../../../primitives/test-helpers" }
105 changes: 85 additions & 20 deletions node/core/candidate-validation/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@
#![warn(missing_docs)]

use polkadot_node_core_pvf::{
InvalidCandidate as WasmInvalidCandidate, PrepareError, PrepareStats, Pvf, ValidationError,
ValidationHost,
InvalidCandidate as WasmInvalidCandidate, PrepareError, PrepareStats, Pvf,
PvfWithExecutorParams, ValidationError, ValidationHost,
};
use polkadot_node_primitives::{
BlockData, InvalidCandidate, PoV, ValidationResult, POV_BOMB_LIMIT, VALIDATION_CODE_BOMB_LIMIT,
Expand All @@ -39,10 +39,11 @@ use polkadot_node_subsystem::{
overseer, FromOrchestra, OverseerSignal, SpawnedSubsystem, SubsystemError, SubsystemResult,
SubsystemSender,
};
use polkadot_node_subsystem_util::executor_params_at_relay_parent;
use polkadot_parachain::primitives::{ValidationParams, ValidationResult as WasmValidationResult};
use polkadot_primitives::{
CandidateCommitments, CandidateDescriptor, CandidateReceipt, Hash, OccupiedCoreAssumption,
PersistedValidationData, ValidationCode, ValidationCodeHash,
vstaging::ExecutorParams, CandidateCommitments, CandidateDescriptor, CandidateReceipt, Hash,
OccupiedCoreAssumption, PersistedValidationData, ValidationCode, ValidationCodeHash,
};

use parity_scale_codec::Encode;
Expand Down Expand Up @@ -175,12 +176,14 @@ async fn run<Context>(
response_sender,
) => {
let bg = {
let mut sender = ctx.sender().clone();
let metrics = metrics.clone();
let validation_host = validation_host.clone();

async move {
let _timer = metrics.time_validate_from_exhaustive();
let res = validate_candidate_exhaustive(
&mut sender,
validation_host,
persisted_validation_data,
validation_code,
Expand Down Expand Up @@ -307,18 +310,38 @@ where
},
};

let validation_code = match sp_maybe_compressed_blob::decompress(
let executor_params =
if let Ok(executor_params) = executor_params_at_relay_parent(relay_parent, sender).await {
gum::debug!(
target: LOG_TARGET,
?relay_parent,
?validation_code_hash,
"precheck: acquired executor params for the session: {:?}",
executor_params,
);
executor_params
} else {
gum::warn!(
target: LOG_TARGET,
?relay_parent,
?validation_code_hash,
"precheck: failed to acquire executor params for the session, thus voting against.",
);
return PreCheckOutcome::Invalid
};

let pvf_with_params = match sp_maybe_compressed_blob::decompress(
&validation_code.0,
VALIDATION_CODE_BOMB_LIMIT,
) {
Ok(code) => Pvf::from_code(code.into_owned()),
Ok(code) => PvfWithExecutorParams::new(Pvf::from_code(code.into_owned()), executor_params),
Err(e) => {
gum::debug!(target: LOG_TARGET, err=?e, "precheck: cannot decompress validation code");
return PreCheckOutcome::Invalid
},
};

match validation_backend.precheck_pvf(validation_code).await {
match validation_backend.precheck_pvf(pvf_with_params).await {
Ok(_) => PreCheckOutcome::Valid,
Err(prepare_err) =>
if prepare_err.is_deterministic() {
Expand Down Expand Up @@ -456,6 +479,7 @@ where
};

let validation_result = validate_candidate_exhaustive(
sender,
validation_host,
validation_data,
validation_code,
Expand Down Expand Up @@ -490,15 +514,19 @@ where
validation_result
}

async fn validate_candidate_exhaustive(
async fn validate_candidate_exhaustive<Sender>(
sender: &mut Sender,
mut validation_backend: impl ValidationBackend + Send,
persisted_validation_data: PersistedValidationData,
validation_code: ValidationCode,
candidate_receipt: CandidateReceipt,
pov: Arc<PoV>,
timeout: Duration,
metrics: &Metrics,
) -> Result<ValidationResult, ValidationFailed> {
) -> Result<ValidationResult, ValidationFailed>
where
Sender: SubsystemSender<RuntimeApiMessage>,
{
let _timer = metrics.time_validate_candidate_exhaustive();

let validation_code_hash = validation_code.hash();
Expand Down Expand Up @@ -554,8 +582,34 @@ async fn validate_candidate_exhaustive(
relay_parent_storage_root: persisted_validation_data.relay_parent_storage_root,
};

let executor_params = if let Ok(executor_params) =
executor_params_at_relay_parent(candidate_receipt.descriptor.relay_parent, sender).await
{
gum::debug!(
target: LOG_TARGET,
?validation_code_hash,
?para_id,
"Acquired executor params for the session: {:?}",
executor_params,
);
executor_params
} else {
gum::warn!(
target: LOG_TARGET,
?validation_code_hash,
?para_id,
"Failed to acquire executor params for the session",
);
return Ok(ValidationResult::Invalid(InvalidCandidate::BadParent))
};

let result = validation_backend
.validate_candidate_with_retry(raw_validation_code.to_vec(), timeout, params)
.validate_candidate_with_retry(
raw_validation_code.to_vec(),
timeout,
params,
executor_params,
)
.await;

if let Err(ref error) = result {
Expand Down Expand Up @@ -613,7 +667,7 @@ trait ValidationBackend {
/// Tries executing a PVF a single time (no retries).
async fn validate_candidate(
&mut self,
pvf: Pvf,
pvf_with_params: PvfWithExecutorParams,
timeout: Duration,
encoded_params: Vec<u8>,
) -> Result<WasmValidationResult, ValidationError>;
Expand All @@ -625,12 +679,14 @@ trait ValidationBackend {
raw_validation_code: Vec<u8>,
timeout: Duration,
params: ValidationParams,
executor_params: ExecutorParams,
) -> Result<WasmValidationResult, ValidationError> {
// Construct the PVF a single time, since it is an expensive operation. Cloning it is cheap.
let pvf = Pvf::from_code(raw_validation_code);
let pvf_with_params =
PvfWithExecutorParams::new(Pvf::from_code(raw_validation_code), executor_params);

let mut validation_result =
self.validate_candidate(pvf.clone(), timeout, params.encode()).await;
self.validate_candidate(pvf_with_params.clone(), timeout, params.encode()).await;

// If we get an AmbiguousWorkerDeath error, retry once after a brief delay, on the
// assumption that the conditions that caused this error may have been transient. Note that
Expand All @@ -643,34 +699,40 @@ trait ValidationBackend {

gum::warn!(
target: LOG_TARGET,
?pvf,
?pvf_with_params,
"Re-trying failed candidate validation due to AmbiguousWorkerDeath."
);

// Encode the params again when re-trying. We expect the retry case to be relatively
// rare, and we want to avoid unconditionally cloning data.
validation_result = self.validate_candidate(pvf, timeout, params.encode()).await;
validation_result =
self.validate_candidate(pvf_with_params, timeout, params.encode()).await;
}

validation_result
}

async fn precheck_pvf(&mut self, pvf: Pvf) -> Result<PrepareStats, PrepareError>;
async fn precheck_pvf(
&mut self,
pvf_with_params: PvfWithExecutorParams,
) -> Result<PrepareStats, PrepareError>;
}

#[async_trait]
impl ValidationBackend for ValidationHost {
/// Tries executing a PVF a single time (no retries).
async fn validate_candidate(
&mut self,
pvf: Pvf,
pvf_with_params: PvfWithExecutorParams,
timeout: Duration,
encoded_params: Vec<u8>,
) -> Result<WasmValidationResult, ValidationError> {
let priority = polkadot_node_core_pvf::Priority::Normal;

let (tx, rx) = oneshot::channel();
if let Err(err) = self.execute_pvf(pvf, timeout, encoded_params, priority, tx).await {
if let Err(err) =
self.execute_pvf(pvf_with_params, timeout, encoded_params, priority, tx).await
{
return Err(ValidationError::InternalError(format!(
"cannot send pvf to the validation host: {:?}",
err
Expand All @@ -681,9 +743,12 @@ impl ValidationBackend for ValidationHost {
.map_err(|_| ValidationError::InternalError("validation was cancelled".into()))?
}

async fn precheck_pvf(&mut self, pvf: Pvf) -> Result<PrepareStats, PrepareError> {
async fn precheck_pvf(
&mut self,
pvf_with_params: PvfWithExecutorParams,
) -> Result<PrepareStats, PrepareError> {
let (tx, rx) = oneshot::channel();
if let Err(err) = self.precheck_pvf(pvf, tx).await {
if let Err(err) = self.precheck_pvf(pvf_with_params, tx).await {
// Return an IO error if there was an error communicating with the host.
return Err(PrepareError::IoErr(err))
}
Expand Down
Loading

0 comments on commit 9a17c4e

Please sign in to comment.