You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The k8s performance env mirror node importer ran into multiple issues as described in #3102. The first of the chain of events is the frequent importer pod leader flipping.
The logs show the flipping is caused by incorrect health status (false negative, report DOWN when it should be UP or UNKNOWN) reported by the StreamFileHealthIndicator. Below is a comprehensive list of issues found so far, most of them can cause false-negative while some can cause false-positive:
The first stream file close latency is calculated using the made-up start after timestamp and the timestamp of the first downloaded stream file. For account balance file, when the importer is started with empty db and no startDate, it most likely ends up with the first account balance file close latency to be less than 15 minutes. The StreamFileHealthIndicator will report DOWN before the second account balance file is downloaded and the DOWN status will appear in between account balance files until the mean + 10s processingTimeout is > 15 minutes.
When a leader pod for some reason becomes follower and afterward transits back to leader, because the cached lastHealthStatus has a lastCheck timestamp of quite some time ago, in getResolvedHealthWhenNoStreamFilesParsed, the current time will be certainly after the allowed window (lastCheck + mean stream close latency + processingTimeout) and the pod is immediately marked as unhealthy.
when the system time is after the configured endDate, StreamFileHealthIndicator reports UP with reason "EndDate has passed, stream files are no longer expected". This causes false-positive and it's only a small window for up-to-date importer due to delays in the pipeline.
Steps to reproduce
Install mirrornode using helm charts, with an empty db, no startDate set
Check the importer logs for leader flipping
Additional context
No response
Hedera network
other
Version
v0.47.0
Operating system
No response
The text was updated successfully, but these errors were encountered:
Description
The k8s performance env mirror node importer ran into multiple issues as described in #3102. The first of the chain of events is the frequent importer pod leader flipping.
The logs show the flipping is caused by incorrect health status (false negative, report DOWN when it should be UP or UNKNOWN) reported by the
StreamFileHealthIndicator
. Below is a comprehensive list of issues found so far, most of them can cause false-negative while some can cause false-positive:startDate
, it most likely ends up with the first account balance file close latency to be less than 15 minutes. TheStreamFileHealthIndicator
will report DOWN before the second account balance file is downloaded and the DOWN status will appear in between account balance files until the mean + 10s processingTimeout is > 15 minutes.lastHealthStatus
has alastCheck
timestamp of quite some time ago, ingetResolvedHealthWhenNoStreamFilesParsed
, the current time will be certainly after the allowed window (lastCheck + mean stream close latency + processingTimeout) and the pod is immediately marked as unhealthy.endDate
,StreamFileHealthIndicator
reports UP with reason "EndDate has passed, stream files are no longer expected". This causes false-positive and it's only a small window for up-to-date importer due to delays in the pipeline.Steps to reproduce
startDate
setAdditional context
No response
Hedera network
other
Version
v0.47.0
Operating system
No response
The text was updated successfully, but these errors were encountered: