Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve monitor startup performance #2395

Merged
merged 2 commits into from
Aug 11, 2021
Merged

Improve monitor startup performance #2395

merged 2 commits into from
Aug 11, 2021

Conversation

steven-sheehy
Copy link
Member

@steven-sheehy steven-sheehy commented Aug 11, 2021

Description:

  • Improve monitor startup performance by moving publish and subscribe flow initialization off main thread
  • Increase node validation timeout to 30s
  • Adjust monitor probe delay to reflect startup improvement

Related issue(s):

Notes for reviewer:
Monitor was crashing due to failing liveness probes in performance when a heavy load was present that caused main nodes to take longer than 10s:

2021-08-11T11:02:34.195-0600 INFO main c.h.m.m.MonitorApplication The following profiles are active: kubernetes
2021-08-11T11:02:56.008-0600 INFO main c.h.m.m.s.g.GrpcClientSDK Connecting 1 clients to mirror-grpc:5600
2021-08-11T11:03:01.294-0600 INFO main c.h.m.m.s.r.RestSubscriber Connecting to mirror node http://mirror-rest:80/api/v1
2021-08-11T11:03:01.892-0600 INFO main c.h.m.m.p.g.CompositeTransactionGenerator Activated scenario: PublishScenarioProperties(super=ScenarioProperties(duration=PT2562047H47M16.854775807S, enabled=true, limit=9223372036854775807, name=pinger, retry=ScenarioProperties.RetryProperties(maxAttempts=1, maxBackoff=PT8S, minBackoff=PT0.25S)), logResponse=false, properties={topicId=${topic.ping}}, receiptPercent=1.0, recordPercent=0.0, timeout=PT13S, tps=0.1, type=CONSENSUS_SUBMIT_MESSAGE)
2021-08-11T11:03:03.400-0600 INFO main c.h.m.m.c.MonitorConfiguration Starting publisher flow
2021-08-11T11:03:03.993-0600 INFO main c.h.m.m.c.MonitorConfiguration Starting subscribe flow
2021-08-11T11:03:18.797-0600 INFO single-1 c.h.m.m.p.TransactionPublisher Validated node: NodeProperties(accountId=0.0.3, host=146.148.65.62, port=50211)
2021-08-11T11:03:29.194-0600 WARN single-1 c.h.m.m.p.TransactionPublisher Unable to validate node NodeProperties(accountId=0.0.4, host=34.74.82.254, port=50211): Timed out
2021-08-11T11:03:39.394-0600 WARN single-1 c.h.m.m.p.TransactionPublisher Unable to validate node NodeProperties(accountId=0.0.5, host=34.86.17.182, port=50211): Timed out
2021-08-11T11:03:49.073-0600 INFO single-1 c.h.m.m.p.TransactionPublisher Validated node: NodeProperties(accountId=0.0.6, host=34.101.171.0, port=50211)
2021-08-11T11:03:58.018-0600 INFO single-1 c.h.m.m.p.TransactionPublisher Validated node: NodeProperties(accountId=0.0.7, host=35.194.125.44, port=50211)
2021-08-11T11:04:06.936-0600 INFO single-1 c.h.m.m.p.TransactionPublisher Validated node: NodeProperties(accountId=0.0.8, host=35.228.241.206, port=50211)
2021-08-11T11:04:17.306-0600 WARN single-1 c.h.m.m.p.TransactionPublisher Unable to validate node NodeProperties(accountId=0.0.9, host=34.118.118.66, port=50211): Timed out
2021-08-11T11:04:26.883-0600 INFO single-1 c.h.m.m.p.TransactionPublisher Validated node: NodeProperties(accountId=0.0.10, host=34.87.30.76, port=50211)
2021-08-11T11:04:36.981-0600 WARN single-1 c.h.m.m.p.TransactionPublisher Unable to validate node NodeProperties(accountId=0.0.11, host=35.203.2.132, port=50211): Timed out
2021-08-11T11:04:47.278-0600 WARN single-1 c.h.m.m.p.TransactionPublisher Unable to validate node NodeProperties(accountId=0.0.12, host=35.246.68.126, port=50211): Timed out
2021-08-11T11:04:57.433-0600 WARN single-1 c.h.m.m.p.TransactionPublisher Unable to validate node NodeProperties(accountId=0.0.13, host=34.94.104.178, port=50211): Timed out
2021-08-11T11:05:05.558-0600 INFO single-1 c.h.m.m.p.TransactionPublisher Validated node: NodeProperties(accountId=0.0.14, host=34.83.205.137, port=50211)
2021-08-11T11:05:14.291-0600 INFO single-1 c.h.m.m.p.TransactionPublisher Validated node: NodeProperties(accountId=0.0.15, host=35.234.74.241, port=50211)
2021-08-11T11:05:24.609-0600 WARN single-1 c.h.m.m.p.TransactionPublisher Unable to validate node NodeProperties(accountId=0.0.16, host=35.204.14.177, port=50211): Timed out
2021-08-11T11:05:34.694-0600 WARN single-1 c.h.m.m.p.TransactionPublisher Unable to validate node NodeProperties(accountId=0.0.17, host=35.245.216.12, port=50211): Timed out
2021-08-11T11:05:44.781-0600 WARN single-1 c.h.m.m.p.TransactionPublisher Unable to validate node NodeProperties(accountId=0.0.18, host=35.194.80.173, port=50211): Timed out
2021-08-11T11:05:53.815-0600 INFO single-1 c.h.m.m.p.TransactionPublisher Validated node: NodeProperties(accountId=0.0.19, host=35.199.70.51, port=50211)
<crash>

After changes (note the MonitorApplication log is now present before validation):

2021-08-11T11:14:00.694-0600 INFO main c.h.m.m.MonitorApplication No active profile set, falling back to default profiles: default 
2021-08-11T11:14:01.898-0600 INFO main c.h.m.m.s.g.GrpcClientSDK Connecting 1 clients to hcs.previewnet.mirrornode.hedera.com:5600 
2021-08-11T11:14:02.211-0600 INFO main c.h.m.m.s.r.RestSubscriber Connecting to mirror node https://previewnet.mirrornode.hedera.com:443/api/v1 
2021-08-11T11:14:02.249-0600 INFO main c.h.m.m.p.g.CompositeTransactionGenerator Activated scenario: PublishScenarioProperties(super=ScenarioProperties(duration=PT2562047H47M16.854775807S, enabled=true, limit=9223372036854775807, name=pinger, retry=ScenarioProperties.RetryProperties(maxAttempts=1, maxBackoff=PT8S, minBackoff=PT0.25S)), logResponse=false, properties={topicId=${topic.ping}}, receiptPercent=1.0, recordPercent=0.0, timeout=PT13S, tps=0.1, type=CONSENSUS_SUBMIT_MESSAGE) 
2021-08-11T11:14:02.341-0600 INFO single-1 c.h.m.m.c.MonitorConfiguration Starting publisher flow 
2021-08-11T11:14:02.348-0600 INFO parallel-1 c.h.m.m.c.MonitorConfiguration Starting subscribe flow 
2021-08-11T11:14:03.298-0600 INFO main c.h.m.m.MonitorApplication Started MonitorApplication in 3.707 seconds (JVM running for 4.815) 
2021-08-11T11:14:04.896-0600 INFO parallel-1 c.h.m.m.p.TransactionPublisher Validated node: NodeProperties(accountId=0.0.7, host=4.previewnet.hedera.com, port=50211) 
2021-08-11T11:14:06.915-0600 INFO parallel-1 c.h.m.m.p.TransactionPublisher Validated node: NodeProperties(accountId=0.0.3, host=0.previewnet.hedera.com, port=50211) 
2021-08-11T11:14:07.961-0600 INFO parallel-1 c.h.m.m.p.TransactionPublisher Validated node: NodeProperties(accountId=0.0.4, host=1.previewnet.hedera.com, port=50211) 
2021-08-11T11:14:08.954-0600 INFO parallel-1 c.h.m.m.p.TransactionPublisher Validated node: NodeProperties(accountId=0.0.5, host=2.previewnet.hedera.com, port=50211) 

Checklist

  • Documented (Code comments, README, etc.)
  • Tested (unit, integration, etc.)

Signed-off-by: Steven Sheehy <steven.sheehy@hedera.com>
@steven-sheehy steven-sheehy added bug Type: Something isn't working P1 performance monitor Area: Monitoring and dashboard labels Aug 11, 2021
@steven-sheehy steven-sheehy added this to the Mirror 0.39.0 milestone Aug 11, 2021
@steven-sheehy steven-sheehy requested a review from a team August 11, 2021 17:20
@steven-sheehy steven-sheehy self-assigned this Aug 11, 2021
@codecov
Copy link

codecov bot commented Aug 11, 2021

Codecov Report

Merging #2395 (5463dd2) into main (3781839) will decrease coverage by 27.22%.
The diff coverage is 60.00%.

❗ Current head 5463dd2 differs from pull request most recent head 59625af. Consider uploading reports for the commit 59625af to get more accurate results
Impacted file tree graph

@@              Coverage Diff              @@
##               main    #2395       +/-   ##
=============================================
- Coverage     84.43%   57.20%   -27.23%     
+ Complexity     2332      615     -1717     
=============================================
  Files           440      127      -313     
  Lines         12004     2657     -9347     
  Branches       1021      174      -847     
=============================================
- Hits          10135     1520     -8615     
+ Misses         1552     1070      -482     
+ Partials        317       67      -250     
Impacted Files Coverage Δ
...ra/mirror/monitor/config/MonitorConfiguration.java 0.00% <0.00%> (ø)
...a/mirror/monitor/publish/TransactionPublisher.java 92.78% <100.00%> (ø)
...or/importer/domain/AddressBookServiceEndpoint.java
...nstruction/token_create_transaction_constructor.go
stream/signatureObject.js
...ter/repository/upsert/NftUpsertQueryGenerator.java
...eader/balance/line/AccountBalanceLineParserV1.java
hedera-mirror-rosetta/test/domain/domain.go
.../hedera/mirror/importer/domain/TokenAccountId.java
...mporter/parser/record/pubsub/PubSubProperties.java
... and 305 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3781839...59625af. Read the comment docs.

Signed-off-by: Steven Sheehy <steven.sheehy@hedera.com>
@sonarcloud
Copy link

sonarcloud bot commented Aug 11, 2021

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@steven-sheehy steven-sheehy merged commit 44a3c9a into main Aug 11, 2021
@steven-sheehy steven-sheehy deleted the monitor-startup branch August 11, 2021 18:35
steven-sheehy added a commit that referenced this pull request Aug 11, 2021
* Improve monitor startup performance by moving publish and subscribe flow initialization off main thread
* Increase node validation timeout to 30s
* Adjust monitor probe delay to reflect startup improvement

Signed-off-by: Steven Sheehy <steven.sheehy@hedera.com>
steven-sheehy added a commit that referenced this pull request Aug 11, 2021
* Improve monitor startup performance by moving publish and subscribe flow initialization off main thread
* Increase node validation timeout to 30s
* Adjust monitor probe delay to reflect startup improvement

Signed-off-by: Steven Sheehy <steven.sheehy@hedera.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Type: Something isn't working monitor Area: Monitoring and dashboard P1 performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants