Several issues in StreamFileHealthIndicator #3131

xin-hedera · 2022-01-11T16:13:53Z

Description

The k8s performance env mirror node importer ran into multiple issues as described in #3102. The first of the chain of events is the frequent importer pod leader flipping.

The logs show the flipping is caused by incorrect health status (false negative, report DOWN when it should be UP or UNKNOWN) reported by the StreamFileHealthIndicator. Below is a comprehensive list of issues found so far, most of them can cause false-negative while some can cause false-positive:

The first stream file close latency is calculated using the made-up start after timestamp and the timestamp of the first downloaded stream file. For account balance file, when the importer is started with empty db and no startDate, it most likely ends up with the first account balance file close latency to be less than 15 minutes. The StreamFileHealthIndicator will report DOWN before the second account balance file is downloaded and the DOWN status will appear in between account balance files until the mean + 10s processingTimeout is > 15 minutes.
When a leader pod for some reason becomes follower and afterward transits back to leader, because the cached lastHealthStatus has a lastCheck timestamp of quite some time ago, in getResolvedHealthWhenNoStreamFilesParsed, the current time will be certainly after the allowed window (lastCheck + mean stream close latency + processingTimeout) and the pod is immediately marked as unhealthy.
when the system time is after the configured endDate, StreamFileHealthIndicator reports UP with reason "EndDate has passed, stream files are no longer expected". This causes false-positive and it's only a small window for up-to-date importer due to delays in the pipeline.

Steps to reproduce

Install mirrornode using helm charts, with an empty db, no startDate set
Check the importer logs for leader flipping

Additional context

No response

Hedera network

other

Version

v0.47.0

Operating system

No response

The text was updated successfully, but these errors were encountered:

xin-hedera added bug Type: Something isn't working parser Area: File parsing labels Jan 11, 2022

This was referenced Jan 21, 2022

Set importer replicas to 1 to workaround leader election bugs #3190

Merged

Remove streamFileActivity from importer liveness probe #3216

Merged

steven-sheehy mentioned this issue Feb 18, 2022

Add a migration readiness probe #3306

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Several issues in StreamFileHealthIndicator #3131

Several issues in StreamFileHealthIndicator #3131

xin-hedera commented Jan 11, 2022

Several issues in StreamFileHealthIndicator #3131

Several issues in StreamFileHealthIndicator #3131

Comments

xin-hedera commented Jan 11, 2022

Description

Steps to reproduce

Additional context

Hedera network

Version

Operating system