Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several issues in StreamFileHealthIndicator #3131

Open
xin-hedera opened this issue Jan 11, 2022 · 0 comments
Open

Several issues in StreamFileHealthIndicator #3131

xin-hedera opened this issue Jan 11, 2022 · 0 comments
Labels
bug Type: Something isn't working parser Area: File parsing

Comments

@xin-hedera
Copy link
Collaborator

Description

The k8s performance env mirror node importer ran into multiple issues as described in #3102. The first of the chain of events is the frequent importer pod leader flipping.

The logs show the flipping is caused by incorrect health status (false negative, report DOWN when it should be UP or UNKNOWN) reported by the StreamFileHealthIndicator. Below is a comprehensive list of issues found so far, most of them can cause false-negative while some can cause false-positive:

  1. The first stream file close latency is calculated using the made-up start after timestamp and the timestamp of the first downloaded stream file. For account balance file, when the importer is started with empty db and no startDate, it most likely ends up with the first account balance file close latency to be less than 15 minutes. The StreamFileHealthIndicator will report DOWN before the second account balance file is downloaded and the DOWN status will appear in between account balance files until the mean + 10s processingTimeout is > 15 minutes.
  2. When a leader pod for some reason becomes follower and afterward transits back to leader, because the cached lastHealthStatus has a lastCheck timestamp of quite some time ago, in getResolvedHealthWhenNoStreamFilesParsed, the current time will be certainly after the allowed window (lastCheck + mean stream close latency + processingTimeout) and the pod is immediately marked as unhealthy.
  3. when the system time is after the configured endDate, StreamFileHealthIndicator reports UP with reason "EndDate has passed, stream files are no longer expected". This causes false-positive and it's only a small window for up-to-date importer due to delays in the pipeline.

Steps to reproduce

  1. Install mirrornode using helm charts, with an empty db, no startDate set
  2. Check the importer logs for leader flipping

Additional context

No response

Hedera network

other

Version

v0.47.0

Operating system

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Type: Something isn't working parser Area: File parsing
Projects
None yet
Development

No branches or pull requests

1 participant