Executor Environment parameterization (#6161)

* Re-apply changes without Diener, rebase to the lastest master * Cache pruning * Bit-pack InstantiationStrategy * Move ExecutorParams version inside the structure itself * Rework runtime API and executor parameters storage * Pass executor parameters through backing subsystem * Update Cargo.lock * Introduce `ExecutorParams` to approval voting subsys * Introduce `ExecutorParams` to dispute coordinator * `cargo fmt` * Simplify requests from backing subsys * Fix tests * Replace manual config cloning with `.clone()` * Move constants to module * Parametrize executor performing PVF pre-check * Fix Malus * Fix test runtime * Introduce session executor params as a constant defined by session info pallet * Use Parity SCALE codec instead of hand-crafted binary encoding * Get rid of constants; Add docs * Get rid of constants * Minor typo * Fix Malus after rebase * `cargo fmt` * Use transparent SCALE encoding instead of explicit * Clean up * Get rid of relay parent to session index mapping * Join environment type and version in a single enum element * Use default execution parameters if running an old runtime * `unwrap()` -> `expect()` * Correct API version * Constants are back in town * Use constants for execution environment types * Artifact separation, first try * Get rid of explicit version * PVF execution queue worker separation * Worker handshake * Global renaming * Minor fixes resolving discussions * Two-stage requesting of executor params to make use of runtime API cache * Proper error handling in pvf-checker * Executor params storage bootstrapping * Propagate migration to v3 network runtimes * Fix storage versioning * Ensure `ExecutorParams` serialization determinism; Add comments * Rename constants to make things a bit more deterministic Get rid of stale code * Tidy up a structure of active PVFs * Minor formatting * Fix comment * Add try-runtime hooks * Add storage version write on upgrade Co-authored-by: Andronik <write@reusable.software> * Add pre- and post-upgrade assertions * Require to specify environment type; Remove redundant `impl`s * Add `ExecutorParamHash` creation from `H256` * Fix candidate validation subsys tests * Return splittable error from executor params request fn * Revert "Return splittable error from executor params request fn" This reverts commit a85038d. * Decompose approval voting metrics * Use more relevant errors * Minor formatting fix * Assert a valid environment type instead of checking * Fix `try-runtime` hooks * After-merge fixes * Add migration logs * Remove dead code * Fix tests * Fix tests * Back to the strongly typed implementation * Promote strong types to executor interface * Remove stale comment * Move executor params to `SessionInfo`: primitives and runtime * Move executor params to `SessionInfo`: node * Try to bump primitives and API version * Get rid of `MallocSizeOf` * Bump target API version to v4 * Make use of session index already in place * Back to v3 * Fix all the tests * Add migrations to all the runtimes * Make use of existing `SessionInfo` in approval voting subsys * Rename `TARGET` -> `LOG_TARGET` * Bump all the primitives to v3 * Fix Rococo ParachainHost API version * Use `RollingSessionWindow` to acquire `ExecutorParams` in disputes * Fix nits from discussions; add comments * Re-evaluate queue logic * Rework job assignment in execution queue * Add documentation * Use `RuntimeInfo` to obtain `SessionInfo` (with blackjack and caching) * Couple `Pvf` with `ExecutorParams` wherever possible * Put members of `PvfWithExecutorParams` under `Arc` for cheap cloning * Fix comment * Fix CI tests * Fix clippy warnings * Address nits from discussions * Add a placeholder for raw data * Fix non exhaustive match * Remove redundant reexports and fix imports * Keep only necessary semantic features, as discussed * Rework `RuntimeInfo` to support mock implementation for tests * Remove unneeded bound * `cargo fmt` * Revert "Remove unneeded bound" This reverts commit 3bdfc68. * Fix PVF host tests * Fix PVF checker tests * Fix overseer declarations * Simplify tests * `MAX_KEEP_WAITING` timeout based on `BACKGING_EXECUTION_TIMEOUT` * Add a unit test for varying executor parameters * Minor fixes from discussions * Add prechecking max. memory parameter (see paritytech-secops/srlabs_findings#110) * Fix and improve a test * Remove `ExecutionEnvironment` and `RawData` * New primitives versioning in parachain host API * `disputes()` implementation for Kusama and Polkadot * Move `ExecutorParams` from `vstaging` to stable primitives * Move disputes from `vstaging` to stable implementation * Fix `try-runtime` * Fixes after merge * Move `ExecutorParams` to the bottom of `SessionInfo` * Revert "Move executor params to `SessionInfo`: primitives and runtime" This reverts commit 1988355. * Always use fresh activated live hash in pvf precheck (re-apply 029b82b) * Fixing tests (broken commit) * Fix candidate validation tests * Fix PVF host test * Minor fixes * Address discussions * Restore migration * Fix `use` to only include what is needed instead of `*` * Add comment to never touch `DEFAULT_CONFIG` * Update migration to set default `ExecutorParams` for `dispute_period` sessions back * Use `earliest_stored_session` instead of calculations * Nit * Add logs * Treat any runtime error as `NotSupported` again * Always return default executor params if not available * Revert "Always return default executor params if not available" This reverts commit b58ac44. * Add paritytech/substrate#9997 workaround * `cargo fmt` * Remove migration (again!) * Bump executor params to API v4 (backport from #6698) --------- Co-authored-by: Andronik <write@reusable.software>
paritytech · Feb 15, 2023 · 9a17c4e · 9a17c4e
1 parent 10daabc
commit 9a17c4e
Show file tree

Hide file tree

Showing 40 changed files with 1,243 additions and 330 deletions.
diff --git a/Cargo.lock b/Cargo.lock
diff --git a/erasure-coding/benches/scaling_with_validators.rs b/erasure-coding/benches/scaling_with_validators.rs
@@ -15,7 +15,7 @@
 // along with Polkadot.  If not, see <http://www.gnu.org/licenses/>.
 
 use criterion::{criterion_group, criterion_main, BenchmarkId, Criterion, Throughput};
-use polkadot_primitives::v2::Hash;
+use polkadot_primitives::Hash;
 use std::time::Duration;
 
 fn chunks(n_validators: usize, pov: &Vec<u8>) -> Vec<Vec<u8>> {

diff --git a/node/core/candidate-validation/Cargo.toml b/node/core/candidate-validation/Cargo.toml
@@ -17,6 +17,7 @@ polkadot-primitives = { path = "../../../primitives" }
 polkadot-parachain = { path = "../../../parachain" }
 polkadot-node-primitives = { path = "../../primitives" }
 polkadot-node-subsystem = { path = "../../subsystem" }
+polkadot-node-subsystem-util = { path = "../../subsystem-util" }
 polkadot-node-metrics = { path = "../../metrics" }
 
 [target.'cfg(not(any(target_os = "android", target_os = "unknown")))'.dependencies]
@@ -27,6 +28,5 @@ sp-keyring = { git = "https://github.com/paritytech/substrate", branch = "master
 futures = { version = "0.3.21", features = ["thread-pool"] }
 assert_matches = "1.4.0"
 polkadot-node-subsystem-test-helpers = { path = "../../subsystem-test-helpers" }
-polkadot-node-subsystem-util = { path = "../../subsystem-util" }
 sp-core = { git = "https://github.com/paritytech/substrate", branch = "master" }
 test-helpers = { package = "polkadot-primitives-test-helpers", path = "../../../primitives/test-helpers" }
diff --git a/node/core/candidate-validation/src/lib.rs b/node/core/candidate-validation/src/lib.rs
@@ -24,8 +24,8 @@
 #![warn(missing_docs)]
 
 use polkadot_node_core_pvf::{
-	InvalidCandidate as WasmInvalidCandidate, PrepareError, PrepareStats, Pvf, ValidationError,
-	ValidationHost,
+	InvalidCandidate as WasmInvalidCandidate, PrepareError, PrepareStats, Pvf,
+	PvfWithExecutorParams, ValidationError, ValidationHost,
 };
 use polkadot_node_primitives::{
 	BlockData, InvalidCandidate, PoV, ValidationResult, POV_BOMB_LIMIT, VALIDATION_CODE_BOMB_LIMIT,
@@ -39,10 +39,11 @@ use polkadot_node_subsystem::{
 	overseer, FromOrchestra, OverseerSignal, SpawnedSubsystem, SubsystemError, SubsystemResult,
 	SubsystemSender,
 };
+use polkadot_node_subsystem_util::executor_params_at_relay_parent;
 use polkadot_parachain::primitives::{ValidationParams, ValidationResult as WasmValidationResult};
 use polkadot_primitives::{
-	CandidateCommitments, CandidateDescriptor, CandidateReceipt, Hash, OccupiedCoreAssumption,
-	PersistedValidationData, ValidationCode, ValidationCodeHash,
+	vstaging::ExecutorParams, CandidateCommitments, CandidateDescriptor, CandidateReceipt, Hash,
+	OccupiedCoreAssumption, PersistedValidationData, ValidationCode, ValidationCodeHash,
 };
 
 use parity_scale_codec::Encode;
@@ -175,12 +176,14 @@ async fn run<Context>(
 					response_sender,
 				) => {
 					let bg = {
+						let mut sender = ctx.sender().clone();
 						let metrics = metrics.clone();
 						let validation_host = validation_host.clone();
 
 						async move {
 							let _timer = metrics.time_validate_from_exhaustive();
 							let res = validate_candidate_exhaustive(
+								&mut sender,
 								validation_host,
 								persisted_validation_data,
 								validation_code,
@@ -307,18 +310,38 @@ where
 			},
 		};
 
-	let validation_code = match sp_maybe_compressed_blob::decompress(
+	let executor_params =
+		if let Ok(executor_params) = executor_params_at_relay_parent(relay_parent, sender).await {
+			gum::debug!(
+				target: LOG_TARGET,
+				?relay_parent,
+				?validation_code_hash,
+				"precheck: acquired executor params for the session: {:?}",
+				executor_params,
+			);
+			executor_params
+		} else {
+			gum::warn!(
+				target: LOG_TARGET,
+				?relay_parent,
+				?validation_code_hash,
+				"precheck: failed to acquire executor params for the session, thus voting against.",
+			);
+			return PreCheckOutcome::Invalid
+		};
+
+	let pvf_with_params = match sp_maybe_compressed_blob::decompress(
 		&validation_code.0,
 		VALIDATION_CODE_BOMB_LIMIT,
 	) {
-		Ok(code) => Pvf::from_code(code.into_owned()),
+		Ok(code) => PvfWithExecutorParams::new(Pvf::from_code(code.into_owned()), executor_params),
 		Err(e) => {
 			gum::debug!(target: LOG_TARGET, err=?e, "precheck: cannot decompress validation code");
 			return PreCheckOutcome::Invalid
 		},
 	};
 
-	match validation_backend.precheck_pvf(validation_code).await {
+	match validation_backend.precheck_pvf(pvf_with_params).await {
 		Ok(_) => PreCheckOutcome::Valid,
 		Err(prepare_err) =>
 			if prepare_err.is_deterministic() {
@@ -456,6 +479,7 @@ where
 		};
 
 	let validation_result = validate_candidate_exhaustive(
+		sender,
 		validation_host,
 		validation_data,
 		validation_code,
@@ -490,15 +514,19 @@ where
 	validation_result
 }
 
-async fn validate_candidate_exhaustive(
+async fn validate_candidate_exhaustive<Sender>(
+	sender: &mut Sender,
 	mut validation_backend: impl ValidationBackend + Send,
 	persisted_validation_data: PersistedValidationData,
 	validation_code: ValidationCode,
 	candidate_receipt: CandidateReceipt,
 	pov: Arc<PoV>,
 	timeout: Duration,
 	metrics: &Metrics,
-) -> Result<ValidationResult, ValidationFailed> {
+) -> Result<ValidationResult, ValidationFailed>
+where
+	Sender: SubsystemSender<RuntimeApiMessage>,
+{
 	let _timer = metrics.time_validate_candidate_exhaustive();
 
 	let validation_code_hash = validation_code.hash();
@@ -554,8 +582,34 @@ async fn validate_candidate_exhaustive(
 		relay_parent_storage_root: persisted_validation_data.relay_parent_storage_root,
 	};
 
+	let executor_params = if let Ok(executor_params) =
+		executor_params_at_relay_parent(candidate_receipt.descriptor.relay_parent, sender).await
+	{
+		gum::debug!(
+			target: LOG_TARGET,
+			?validation_code_hash,
+			?para_id,
+			"Acquired executor params for the session: {:?}",
+			executor_params,
+		);
+		executor_params
+	} else {
+		gum::warn!(
+			target: LOG_TARGET,
+			?validation_code_hash,
+			?para_id,
+			"Failed to acquire executor params for the session",
+		);
+		return Ok(ValidationResult::Invalid(InvalidCandidate::BadParent))
+	};
+
 	let result = validation_backend
-		.validate_candidate_with_retry(raw_validation_code.to_vec(), timeout, params)
+		.validate_candidate_with_retry(
+			raw_validation_code.to_vec(),
+			timeout,
+			params,
+			executor_params,
+		)
 		.await;
 
 	if let Err(ref error) = result {
@@ -613,7 +667,7 @@ trait ValidationBackend {
 	/// Tries executing a PVF a single time (no retries).
 	async fn validate_candidate(
 		&mut self,
-		pvf: Pvf,
+		pvf_with_params: PvfWithExecutorParams,
 		timeout: Duration,
 		encoded_params: Vec<u8>,
 	) -> Result<WasmValidationResult, ValidationError>;
@@ -625,12 +679,14 @@ trait ValidationBackend {
 		raw_validation_code: Vec<u8>,
 		timeout: Duration,
 		params: ValidationParams,
+		executor_params: ExecutorParams,
 	) -> Result<WasmValidationResult, ValidationError> {
 		// Construct the PVF a single time, since it is an expensive operation. Cloning it is cheap.
-		let pvf = Pvf::from_code(raw_validation_code);
+		let pvf_with_params =
+			PvfWithExecutorParams::new(Pvf::from_code(raw_validation_code), executor_params);
 
 		let mut validation_result =
-			self.validate_candidate(pvf.clone(), timeout, params.encode()).await;
+			self.validate_candidate(pvf_with_params.clone(), timeout, params.encode()).await;
 
 		// If we get an AmbiguousWorkerDeath error, retry once after a brief delay, on the
 		// assumption that the conditions that caused this error may have been transient. Note that
@@ -643,34 +699,40 @@ trait ValidationBackend {
 
 			gum::warn!(
 				target: LOG_TARGET,
-				?pvf,
+				?pvf_with_params,
 				"Re-trying failed candidate validation due to AmbiguousWorkerDeath."
 			);
 
 			// Encode the params again when re-trying. We expect the retry case to be relatively
 			// rare, and we want to avoid unconditionally cloning data.
-			validation_result = self.validate_candidate(pvf, timeout, params.encode()).await;
+			validation_result =
+				self.validate_candidate(pvf_with_params, timeout, params.encode()).await;
 		}
 
 		validation_result
 	}
 
-	async fn precheck_pvf(&mut self, pvf: Pvf) -> Result<PrepareStats, PrepareError>;
+	async fn precheck_pvf(
+		&mut self,
+		pvf_with_params: PvfWithExecutorParams,
+	) -> Result<PrepareStats, PrepareError>;
 }
 
 #[async_trait]
 impl ValidationBackend for ValidationHost {
 	/// Tries executing a PVF a single time (no retries).
 	async fn validate_candidate(
 		&mut self,
-		pvf: Pvf,
+		pvf_with_params: PvfWithExecutorParams,
 		timeout: Duration,
 		encoded_params: Vec<u8>,
 	) -> Result<WasmValidationResult, ValidationError> {
 		let priority = polkadot_node_core_pvf::Priority::Normal;
 
 		let (tx, rx) = oneshot::channel();
-		if let Err(err) = self.execute_pvf(pvf, timeout, encoded_params, priority, tx).await {
+		if let Err(err) =
+			self.execute_pvf(pvf_with_params, timeout, encoded_params, priority, tx).await
+		{
 			return Err(ValidationError::InternalError(format!(
 				"cannot send pvf to the validation host: {:?}",
 				err
@@ -681,9 +743,12 @@ impl ValidationBackend for ValidationHost {
 			.map_err(|_| ValidationError::InternalError("validation was cancelled".into()))?
 	}
 
-	async fn precheck_pvf(&mut self, pvf: Pvf) -> Result<PrepareStats, PrepareError> {
+	async fn precheck_pvf(
+		&mut self,
+		pvf_with_params: PvfWithExecutorParams,
+	) -> Result<PrepareStats, PrepareError> {
 		let (tx, rx) = oneshot::channel();
-		if let Err(err) = self.precheck_pvf(pvf, tx).await {
+		if let Err(err) = self.precheck_pvf(pvf_with_params, tx).await {
 			// Return an IO error if there was an error communicating with the host.
 			return Err(PrepareError::IoErr(err))
 		}