-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Signature consensus ratio config #1950
Conversation
Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
Codecov Report
@@ Coverage Diff @@
## master #1950 +/- ##
============================================
+ Coverage 87.05% 87.73% +0.68%
- Complexity 1744 1822 +78
============================================
Files 315 319 +4
Lines 7731 8063 +332
Branches 740 814 +74
============================================
+ Hits 6730 7074 +344
+ Misses 772 744 -28
- Partials 229 245 +16
Continue to review full report at Codecov.
|
Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
hedera-mirror-importer/src/main/java/com/hedera/mirror/importer/downloader/Downloader.java
Outdated
Show resolved
Hide resolved
...rror-importer/src/main/java/com/hedera/mirror/importer/downloader/NodeSignatureVerifier.java
Outdated
Show resolved
Hide resolved
...-importer/src/test/java/com/hedera/mirror/importer/downloader/NodeSignatureVerifierTest.java
Outdated
Show resolved
Hide resolved
...rror-importer/src/main/java/com/hedera/mirror/importer/downloader/NodeSignatureVerifier.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, couple small things. You also have field
misspelled in the description.
...importer/src/main/java/com/hedera/mirror/importer/downloader/CommonDownloaderProperties.java
Outdated
Show resolved
Hide resolved
@@ -75,7 +96,7 @@ public void verify(Collection<FileStreamSignature> signatures) throws SignatureV | |||
|
|||
long sigFileCount = signatures.size(); | |||
long nodeCount = nodeAccountIDPubKeyMap.size(); | |||
if (!canReachConsensus(sigFileCount, nodeCount)) { | |||
if (!canReachConsensus(sigFileCount, nodeCount, commonDownloaderProperties.getConsensusRatio())) { | |||
throw new SignatureVerificationException("Require at least 1/3 signature files to reach consensus, got " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to update the exception message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
signatureVerificationMetric.tag("nodeAccount", nodeAccountId.getEntityNum().toString()) | ||
.tag("realm", nodeAccountId.getRealmNum().toString()) | ||
.tag("shard", nodeAccountId.getShardNum().toString()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we just collapse these into a single nodeAccountId
tag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking was that when we had multiple shards or realms we could filter down by those. Perhaps we could just change that back when that becomes a reality though. Though one could argue we should really be using the node ID for these metrics as the node account ID can change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah future support for shard, realm is good.
Though I do agree we should use nodeId instead. Will see if this is an easy fix or if a ticket and a future PR is better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly for the fact that it's a metric you would want the ability to break out charts for separate realms and shards. If we collapse it that makes it more challenging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So turns out we store nodeId as an integer and not an EntityId in the db which supports the conversion.
I'll open a ticket that captures this and asks for using NodeId instead of NodeAccountId as the unique id in the node verification flow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Captured in #1976
Set<String> missingNodes = new TreeSet<>(Sets.difference(nodeAccountIDPubKeyMap.keySet(), seenNodes)); | ||
statusMap.put("MISSING", missingNodes); | ||
statusMap.remove(SignatureStatus.CONSENSUS_REACHED.toString()); | ||
|
||
String streamType = signatures.stream().map(FileStreamSignature::getStreamType).findFirst().toString(); | ||
missingNodes.forEach(nodeAccountId -> missingNodeSignatureFileMetric.tag("nodeAccount", nodeAccountId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inconsistency of nodeAccount
- here it's
nodeAccountId
- for
signatureVerificationMetric
it's the entity number
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will expand to support shard and realm based on a previous comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conformed to use nodeAccountId full string
Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
statusMap.put("MISSING", missingNodes); | ||
statusMap.remove(SignatureStatus.CONSENSUS_REACHED.toString()); | ||
|
||
String streamType = signatures.stream().map(FileStreamSignature::getStreamType).findFirst().toString(); | ||
missingNodes.forEach(nodeAccountId -> metricMap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two different counters can't be stored in the same map. As it is now, only the first counter is being created and then incremented for both. You will need two separate maps. Or if we combine metrics can ignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Combined into one counter needing only one map
this.commonDownloaderProperties = commonDownloaderProperties; | ||
this.meterRegistry = meterRegistry; | ||
|
||
missingNodeSignatureFileMetric = Counter.builder("hedera.mirror.download.signature.missing") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might make sense to just have a single metric for both. We can add a SignatureStatus.NOT_FOUND
to capture that unique state. The other reason it might be needed is the alert that we have:
sum(rate(hedera_mirror_download_signature_verification_total{application="hedera-mirror-importer", status="CONSENSUS_REACHED"}[2m])) by (namespace, pod, type)
/ sum(rate(hedera_mirror_download_signature_verification_total{application="hedera-mirror-importer"}[2m])) by (namespace, pod, type) < 0.33
Right now there's a problem that if a node is missing the dividend of that equation is decreasing, skewing the accuracy of that calculation. We need to ensure we always have the total be equal to the number of nodes in the address book.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Combined into one counter
private final Counter.Builder missingNodeSignatureFileMetric; | ||
private final Counter.Builder signatureVerificationMetric; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to store these as fields since their construction happens so rarely and it's better to encapsulate the metric creation in a single method. See PublishMetrics for an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adopted similar logic
sigFileCount + " out of " + nodeCount + " for file " + filename + ": " + statusMap(signatures, | ||
nodeAccountIDPubKeyMap)); | ||
throw new SignatureVerificationException(String.format( | ||
"Requires at least %s of signature files to reach consensus, got %d out of %d for file %s: %s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update troubleshooting doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like that file and the alert based on it has been out of date for ages. It's still looking for logs in the form “File could not be verified by at least 2/3 of nodes”
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gonna modify this to start with "Insufficient downloaded signature file count, requires at least..." to make ti easier to alert on and understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about "Unable to reach consensus"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't see this before my last push.
However, I do want to call out the lack of downloaded files because we do the canReachConsensus
before verification and again after. So it would be nice to distinguish.
sigFileCount + " out of " + nodeCount + " for file " + filename + ": " + statusMap(signatures, | ||
nodeAccountIDPubKeyMap)); | ||
throw new SignatureVerificationException(String.format( | ||
"Requires at least %s of signature files to reach consensus, got %d out of %d for file %s: %s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you string formatting twice?
"Requires at least %s of signature files to reach consensus, got %d out of %d for file %s: %s", | |
"Requires at least %.03f of signature files to reach consensus, got %d out of %d for file %s: %s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
} else if (commonDownloaderProperties.getConsensusRatio() == 0) { | ||
log.debug("Signature file {} does not require consensus, skipping verification", filename); | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will pass when no signature file is verified, i.e., signatureHashMap
is empty
the possible effect is an illegitimate stream file can pass verification and get ingested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. You're highlighting the fact that even if consensusRatio is set to 0 we should still only proceed with no error if there are some valid signature stream files.
Makes sense, I'll update it as I don't see a scenario where we'd want to pass with no verified signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably should not log it at all unless it's important.
the updated if block is dead code, since when signatureHashMap.size() > 0
, consensusCount
will be > 0, and the the block above it with condition if (consensusCount > 0)
will run instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I thought this was the scenario you highlighted in your original comment, no.
Initially the plan was to not fail when number of verified signature files was = 0 and commonDownloaderProperties.getConsensusRatio() == 0.
However, per you comment we probably only want to allow this with verified files.
Probably better to move this check up to after verification but before consensus calculation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
CONSENSUS_REACHED // At least 1/3 of all nodes have been verified | ||
DOWNLOADED, // Signature has been downloaded and parsed but not verified | ||
VERIFIED, // Signature has been verified against the node's public key | ||
CONSENSUS_REACHED, // At least 1/3 of all nodes have been verified |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: still makes reference to the 1/3, should probably mention the new config value instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, updated. Let me know if the wording works for you or you have a better suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small nit, otherwise LGTM, won't hold up approval for it.
Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
6782926
SonarCloud Quality Gate failed. |
OPS and customers would benefit from the ability to customize the consensusRatio used by the importer to validate signature files. - Add a `consensusRatio` field to the `CommonDownloaderProperties` - Move `signatureVerificationMetric` from `Downloader.java` to `NodeSignatureVerifier.java` - Add a `NOT_FOUND` status to the `signatureVerificationMetric ` metric to capture nodes whose signature files were missing from the downloaded list Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com> Signed-off-by: Ian Jungmann <ian.jungmann@hedera.com>
OPS and customers would benefit from the ability to customize the consensusRatio used by the importer to validate signature files. - Add a `consensusRatio` field to the `CommonDownloaderProperties` - Move `signatureVerificationMetric` from `Downloader.java` to `NodeSignatureVerifier.java` - Add a `NOT_FOUND` status to the `signatureVerificationMetric ` metric to capture nodes whose signature files were missing from the downloaded list Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com> Signed-off-by: Ian Jungmann <ian.jungmann@hedera.com>
Detailed description:
OPS and customers would benefit from the ability to customize the consensusRatio used by the importer to validate signature files.
consensusRatio
field to theCommonDownloaderProperties
signatureVerificationMetric
fromDownloader.java
toNodeSignatureVerifier.java
NOT_FOUND
status to thesignatureVerificationMetric
metric to capture nodes whose signature files were missing from the downloaded listWhich issue(s) this PR fixes:
Fixes #1869
Special notes for your reviewer:
Checklist