Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signature consensus ratio config #1950

Merged
merged 9 commits into from
May 14, 2021
Merged

Conversation

Nana-EC
Copy link
Contributor

@Nana-EC Nana-EC commented May 7, 2021

Detailed description:
OPS and customers would benefit from the ability to customize the consensusRatio used by the importer to validate signature files.

  • Add a consensusRatio field to the CommonDownloaderProperties
  • Move signatureVerificationMetric from Downloader.java to NodeSignatureVerifier.java
  • Add a NOT_FOUND status to the signatureVerificationMetric metric to capture nodes whose signature files were missing from the downloaded list

Which issue(s) this PR fixes:
Fixes #1869

Special notes for your reviewer:

Checklist

  • Documentation added
  • Tests updated

Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
@Nana-EC Nana-EC added enhancement Type: New feature P1 ops Tasks relating to network operations downloader Area: S3 downloader labels May 7, 2021
@Nana-EC Nana-EC added this to the Mirror 0.34.0 milestone May 7, 2021
@Nana-EC Nana-EC requested a review from a team May 7, 2021 18:34
@Nana-EC Nana-EC self-assigned this May 7, 2021
@codecov
Copy link

codecov bot commented May 7, 2021

Codecov Report

Merging #1950 (6782926) into master (126751e) will increase coverage by 0.68%.
The diff coverage is 92.30%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #1950      +/-   ##
============================================
+ Coverage     87.05%   87.73%   +0.68%     
- Complexity     1744     1822      +78     
============================================
  Files           315      319       +4     
  Lines          7731     8063     +332     
  Branches        740      814      +74     
============================================
+ Hits           6730     7074     +344     
+ Misses          772      744      -28     
- Partials        229      245      +16     
Impacted Files Coverage Δ Complexity Δ
...or/importer/domain/AddressBookServiceEndpoint.java 80.00% <80.00%> (ø) 5.00 <5.00> (?)
...r/importer/addressbook/AddressBookServiceImpl.java 89.56% <89.87%> (-1.72%) 41.00 <20.00> (+10.00) ⬇️
...edera/mirror/importer/domain/AddressBookEntry.java 85.18% <92.85%> (+6.92%) 15.00 <7.00> (-3.00) ⬆️
...ra/mirror/importer/domain/FileStreamSignature.java 85.71% <100.00%> (+0.86%) 17.00 <1.00> (+1.00)
...mporter/downloader/CommonDownloaderProperties.java 76.00% <100.00%> (+2.08%) 14.00 <1.00> (+1.00)
.../hedera/mirror/importer/downloader/Downloader.java 85.71% <100.00%> (-0.50%) 45.00 <0.00> (-1.00)
...ror/importer/downloader/NodeSignatureVerifier.java 93.47% <100.00%> (+3.00%) 26.00 <11.00> (+9.00)
...orter/parser/balance/AccountBalanceFileParser.java 71.92% <0.00%> (-26.32%) 4.00% <0.00%> (-4.00%)
...a/mirror/monitor/subscribe/AbstractSubscriber.java 77.14% <0.00%> (-20.00%) 3.00% <0.00%> (-1.00%)
health.js 43.75% <0.00%> (-10.10%) 0.00% <0.00%> (ø%)
... and 59 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ceceda6...6782926. Read the comment docs.

Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
@Nana-EC Nana-EC marked this pull request as ready for review May 8, 2021 00:27
@Nana-EC Nana-EC requested a review from beeradb May 8, 2021 00:42
Copy link
Contributor

@ijungmann ijungmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, couple small things. You also have field misspelled in the description.

@@ -75,7 +96,7 @@ public void verify(Collection<FileStreamSignature> signatures) throws SignatureV

long sigFileCount = signatures.size();
long nodeCount = nodeAccountIDPubKeyMap.size();
if (!canReachConsensus(sigFileCount, nodeCount)) {
if (!canReachConsensus(sigFileCount, nodeCount, commonDownloaderProperties.getConsensusRatio())) {
throw new SignatureVerificationException("Require at least 1/3 signature files to reach consensus, got " +
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to update the exception message

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Comment on lines 189 to 191
signatureVerificationMetric.tag("nodeAccount", nodeAccountId.getEntityNum().toString())
.tag("realm", nodeAccountId.getRealmNum().toString())
.tag("shard", nodeAccountId.getShardNum().toString())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just collapse these into a single nodeAccountId tag?

Copy link
Member

@steven-sheehy steven-sheehy May 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was that when we had multiple shards or realms we could filter down by those. Perhaps we could just change that back when that becomes a reality though. Though one could argue we should really be using the node ID for these metrics as the node account ID can change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah future support for shard, realm is good.
Though I do agree we should use nodeId instead. Will see if this is an easy fix or if a ticket and a future PR is better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly for the fact that it's a metric you would want the ability to break out charts for separate realms and shards. If we collapse it that makes it more challenging

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So turns out we store nodeId as an integer and not an EntityId in the db which supports the conversion.
I'll open a ticket that captures this and asks for using NodeId instead of NodeAccountId as the unique id in the node verification flow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Captured in #1976

Set<String> missingNodes = new TreeSet<>(Sets.difference(nodeAccountIDPubKeyMap.keySet(), seenNodes));
statusMap.put("MISSING", missingNodes);
statusMap.remove(SignatureStatus.CONSENSUS_REACHED.toString());

String streamType = signatures.stream().map(FileStreamSignature::getStreamType).findFirst().toString();
missingNodes.forEach(nodeAccountId -> missingNodeSignatureFileMetric.tag("nodeAccount", nodeAccountId)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inconsistency of nodeAccount

  • here it's nodeAccountId
  • for signatureVerificationMetric it's the entity number

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will expand to support shard and realm based on a previous comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conformed to use nodeAccountId full string

Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
statusMap.put("MISSING", missingNodes);
statusMap.remove(SignatureStatus.CONSENSUS_REACHED.toString());

String streamType = signatures.stream().map(FileStreamSignature::getStreamType).findFirst().toString();
missingNodes.forEach(nodeAccountId -> metricMap
Copy link
Member

@steven-sheehy steven-sheehy May 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two different counters can't be stored in the same map. As it is now, only the first counter is being created and then incremented for both. You will need two separate maps. Or if we combine metrics can ignore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combined into one counter needing only one map

this.commonDownloaderProperties = commonDownloaderProperties;
this.meterRegistry = meterRegistry;

missingNodeSignatureFileMetric = Counter.builder("hedera.mirror.download.signature.missing")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might make sense to just have a single metric for both. We can add a SignatureStatus.NOT_FOUND to capture that unique state. The other reason it might be needed is the alert that we have:

sum(rate(hedera_mirror_download_signature_verification_total{application="hedera-mirror-importer", status="CONSENSUS_REACHED"}[2m])) by (namespace, pod, type)
/ sum(rate(hedera_mirror_download_signature_verification_total{application="hedera-mirror-importer"}[2m])) by (namespace, pod, type) < 0.33

Right now there's a problem that if a node is missing the dividend of that equation is decreasing, skewing the accuracy of that calculation. We need to ensure we always have the total be equal to the number of nodes in the address book.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combined into one counter

Comment on lines 58 to 59
private final Counter.Builder missingNodeSignatureFileMetric;
private final Counter.Builder signatureVerificationMetric;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to store these as fields since their construction happens so rarely and it's better to encapsulate the metric creation in a single method. See PublishMetrics for an example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adopted similar logic

sigFileCount + " out of " + nodeCount + " for file " + filename + ": " + statusMap(signatures,
nodeAccountIDPubKeyMap));
throw new SignatureVerificationException(String.format(
"Requires at least %s of signature files to reach consensus, got %d out of %d for file %s: %s",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update troubleshooting doc

Copy link
Contributor Author

@Nana-EC Nana-EC May 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like that file and the alert based on it has been out of date for ages. It's still looking for logs in the form “File could not be verified by at least 2/3 of nodes”

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gonna modify this to start with "Insufficient downloaded signature file count, requires at least..." to make ti easier to alert on and understand.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about "Unable to reach consensus"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't see this before my last push.
However, I do want to call out the lack of downloaded files because we do the canReachConsensus before verification and again after. So it would be nice to distinguish.

sigFileCount + " out of " + nodeCount + " for file " + filename + ": " + statusMap(signatures,
nodeAccountIDPubKeyMap));
throw new SignatureVerificationException(String.format(
"Requires at least %s of signature files to reach consensus, got %d out of %d for file %s: %s",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you string formatting twice?

Suggested change
"Requires at least %s of signature files to reach consensus, got %d out of %d for file %s: %s",
"Requires at least %.03f of signature files to reach consensus, got %d out of %d for file %s: %s",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines 135 to 137
} else if (commonDownloaderProperties.getConsensusRatio() == 0) {
log.debug("Signature file {} does not require consensus, skipping verification", filename);
return;
Copy link
Collaborator

@xin-hedera xin-hedera May 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will pass when no signature file is verified, i.e., signatureHashMap is empty

the possible effect is an illegitimate stream file can pass verification and get ingested

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. You're highlighting the fact that even if consensusRatio is set to 0 we should still only proceed with no error if there are some valid signature stream files.
Makes sense, I'll update it as I don't see a scenario where we'd want to pass with no verified signature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably should not log it at all unless it's important.

the updated if block is dead code, since when signatureHashMap.size() > 0, consensusCount will be > 0, and the the block above it with condition if (consensusCount > 0) will run instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I thought this was the scenario you highlighted in your original comment, no.
Initially the plan was to not fail when number of verified signature files was = 0 and commonDownloaderProperties.getConsensusRatio() == 0.
However, per you comment we probably only want to allow this with verified files.

Probably better to move this check up to after verification but before consensus calculation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
steven-sheehy
steven-sheehy previously approved these changes May 13, 2021
Copy link
Member

@steven-sheehy steven-sheehy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

xin-hedera
xin-hedera previously approved these changes May 13, 2021
Copy link
Collaborator

@xin-hedera xin-hedera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

CONSENSUS_REACHED // At least 1/3 of all nodes have been verified
DOWNLOADED, // Signature has been downloaded and parsed but not verified
VERIFIED, // Signature has been verified against the node's public key
CONSENSUS_REACHED, // At least 1/3 of all nodes have been verified
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: still makes reference to the 1/3, should probably mention the new config value instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, updated. Let me know if the wording works for you or you have a better suggestion

ijungmann
ijungmann previously approved these changes May 14, 2021
Copy link
Contributor

@ijungmann ijungmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small nit, otherwise LGTM, won't hold up approval for it.

Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
@Nana-EC Nana-EC dismissed stale reviews from ijungmann, xin-hedera, and steven-sheehy via 6782926 May 14, 2021 18:14
@Nana-EC Nana-EC requested a review from ijungmann May 14, 2021 18:14
@sonarcloud
Copy link

sonarcloud bot commented May 14, 2021

@Nana-EC Nana-EC merged commit 80369c2 into master May 14, 2021
@Nana-EC Nana-EC deleted the signature-consensus-ratio-config branch May 14, 2021 19:45
ijungmann pushed a commit that referenced this pull request May 18, 2021
OPS and customers would benefit from the ability to customize the consensusRatio used by the importer to validate signature files.

- Add a `consensusRatio` field to the `CommonDownloaderProperties`
- Move `signatureVerificationMetric` from `Downloader.java` to `NodeSignatureVerifier.java`
- Add a `NOT_FOUND` status to the `signatureVerificationMetric ` metric to capture nodes whose signature files were missing from the downloaded list

Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
Signed-off-by: Ian Jungmann <ian.jungmann@hedera.com>
ijungmann pushed a commit that referenced this pull request May 24, 2021
OPS and customers would benefit from the ability to customize the consensusRatio used by the importer to validate signature files.

- Add a `consensusRatio` field to the `CommonDownloaderProperties`
- Move `signatureVerificationMetric` from `Downloader.java` to `NodeSignatureVerifier.java`
- Add a `NOT_FOUND` status to the `signatureVerificationMetric ` metric to capture nodes whose signature files were missing from the downloaded list

Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
Signed-off-by: Ian Jungmann <ian.jungmann@hedera.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
downloader Area: S3 downloader enhancement Type: New feature ops Tasks relating to network operations P1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide configuration option to verify recordstream signatures for all nodes
4 participants