Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a cluster health endpoint to monitor #2353

Merged
merged 6 commits into from
Aug 3, 2021
Merged

Conversation

steven-sheehy
Copy link
Member

@steven-sheehy steven-sheehy commented Aug 2, 2021

Description:
Adds a cluster health endpoint to monitor and fixes various bugs with the monitor encountered during testing.

  • Add an /actuator/health/cluster endpoint that returns unknown when publishing inactive
  • Add transaction memo with hostname & scenario to track publish source
  • Add new previewnet node 0.0.7
  • Add support for in-process testing of TransactionPublisher
  • Change mainnet network configuration to use mainnet-public
  • Fix MonitorPublishErrors alert not calculating percentage properly and triggering repeatedly
  • Fix node validation succeeding when network is frozen by switching from a query to a transaction
  • Fix not printing status logs or sampling when scenario is idle
  • Fix rate calculation sometimes returning 0.0 at low TPS
  • Fix publish properties failing on unknown properties (this allows camelcase properties to be supplied via env variables)
  • Fix transaction record lookup doing 3 separate queries
  • Refactor publish flow to store scenario state in new PublishScenario
  • Refactor publish & subscribe flows to use common scenario and scenario properties classes
  • Refactor publish client setup to use Flux

Related issue(s):
Fixes #2313

Notes for reviewer:
Waiting on access to perfnet to test changes at 10K.
Will keep subscriber REST API for now to ease rollout. May delete it later.

Checklist

  • Documented (Code comments, README, etc.)
  • Tested (unit, integration, etc.)

Signed-off-by: Steven Sheehy <steven.sheehy@hedera.com>
@steven-sheehy steven-sheehy added bug Type: Something isn't working P1 monitor Area: Monitoring and dashboard labels Aug 2, 2021
@steven-sheehy steven-sheehy added this to the Mirror 0.38.1 milestone Aug 2, 2021
@steven-sheehy steven-sheehy requested a review from a team August 2, 2021 17:59
@steven-sheehy steven-sheehy self-assigned this Aug 2, 2021
@codecov
Copy link

codecov bot commented Aug 2, 2021

Codecov Report

Merging #2353 (69e22d0) into main (23fb100) will increase coverage by 0.78%.
The diff coverage is 96.58%.

Impacted file tree graph

@@             Coverage Diff              @@
##               main    #2353      +/-   ##
============================================
+ Coverage     83.59%   84.38%   +0.78%     
- Complexity     2258     2313      +55     
============================================
  Files           437      440       +3     
  Lines         11961    12002      +41     
  Branches       1018     1020       +2     
============================================
+ Hits           9999    10128     +129     
+ Misses         1647     1556      -91     
- Partials        315      318       +3     
Impacted Files Coverage Δ
...ra/mirror/monitor/config/MonitorConfiguration.java 0.00% <ø> (ø)
...onitor/publish/generator/TransactionGenerator.java 100.00% <ø> (ø)
...onitor/subscribe/AbstractSubscriberProperties.java 66.66% <ø> (-22.23%) ⬇️
.../mirror/monitor/subscribe/CompositeSubscriber.java 100.00% <ø> (ø)
...era/mirror/monitor/subscribe/MirrorSubscriber.java 100.00% <ø> (ø)
.../com/hedera/mirror/monitor/subscribe/Scenario.java 100.00% <ø> (ø)
...tor/subscribe/controller/SubscriberController.java 100.00% <ø> (ø)
...hedera/mirror/monitor/publish/PublishScenario.java 83.33% <83.33%> (ø)
.../com/hedera/mirror/monitor/ScenarioProperties.java 87.50% <87.50%> (ø)
...va/com/hedera/mirror/monitor/AbstractScenario.java 91.89% <90.00%> (ø)
... and 28 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 23fb100...69e22d0. Read the comment docs.

Signed-off-by: Steven Sheehy <steven.sheehy@hedera.com>
Signed-off-by: Steven Sheehy <steven.sheehy@hedera.com>
Signed-off-by: Steven Sheehy <steven.sheehy@hedera.com>
xin-hedera
xin-hedera previously approved these changes Aug 3, 2021
Copy link
Collaborator

@xin-hedera xin-hedera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@Nana-EC Nana-EC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, requesting some further comments to capture intents for easier code management

@Validated
public static class RetryProperties {

@Min(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minimum should be 1, setting to 0 will never attempt.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but Flux retry spec uses this as the "retry attempts" not total attempts like the SDK so it needs to remain 0 here. I can add an assertion in ConfigurableTransactionGenerator that validates it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair.

Signed-off-by: Steven Sheehy <steven.sheehy@hedera.com>
Signed-off-by: Steven Sheehy <steven.sheehy@hedera.com>
@sonarcloud
Copy link

sonarcloud bot commented Aug 3, 2021

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 6 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

Nana-EC
Nana-EC previously approved these changes Aug 3, 2021
Copy link
Contributor

@Nana-EC Nana-EC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@xin-hedera xin-hedera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SLGTM

@steven-sheehy steven-sheehy merged commit ca7ffa4 into main Aug 3, 2021
@steven-sheehy steven-sheehy deleted the cluster-status branch August 3, 2021 23:14
steven-sheehy added a commit that referenced this pull request Aug 10, 2021
Adds a cluster health endpoint to monitor and fixes various bugs with the monitor encountered during testing.

* Add an /actuator/health/cluster endpoint that returns unknown when publishing inactive
* Add transaction memo with hostname & scenario to track publish source
* Add new previewnet node 0.0.7
* Add support for in-process testing of TransactionPublisher
* Change mainnet network configuration to use mainnet-public
* Fix MonitorPublishErrors alert not calculating percentage properly and triggering repeatedly
* Fix node validation succeeding when network is frozen by switching from a query to a transaction
* Fix not printing status logs or sampling when scenario is idle
* Fix rate calculation sometimes returning 0.0 at low TPS
* Fix publish properties failing on unknown properties (this allows camelcase properties to be supplied via env variables)
* Fix transaction record lookup doing 3 separate queries
* Refactor publish flow to store scenario state in new PublishScenario
* Refactor publish & subscribe flows to use common scenario and scenario properties classes
* Refactor publish client setup to use Flux

Signed-off-by: Steven Sheehy <steven.sheehy@hedera.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Type: Something isn't working monitor Area: Monitoring and dashboard P1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Monitor cluster health check returns error when publishing inactive
4 participants