Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segments becoming frequently unavailable when replica = 1 for large datasource #14548

Closed
uditsharma opened this issue Jul 7, 2023 · 2 comments

Comments

@uditsharma
Copy link

Segments becoming frequently unavailable when replica = 1 for large datasource

Affected Version

26.0.0

Description

We have noticed that one of the data source which has 3 TB of data having 30K segments is having frequently unavailable segments. From the finding it looks to me, it is a coordinator balancing issue, where coordinator load a segment to new historical and after loading on new one it ends up dropping from both place.

  • Cluster size:
    • 12 historical
    • 3 broker
    • 6 MM
  • Configurations in use

coordinator config

    druid.service=druid/coordinator
    druid.plaintextPort=8081
    druid.indexer.logs.kill.enabled=true
    druid.indexer.logs.kill.durationToRetain=259200000
    druid.indexer.logs.kill.delay=21600000
    
    druid.extensions.loadList=["druid-google-extensions", "postgresql-metadata-storage", "druid-kafka-indexing-service", "druid-datasketches", "kafka-emitter","druid-multi-stage-query"]
    druid.coordinator.loadqueuepeon.type=curator
    druid.serverview.type=batch
    

    druid.coordinator.startDelay=PT10S
    druid.coordinator.period=PT200S
    druid.coordinator.period.indexingPeriod=PT180S
  • Steps to reproduce the problem

Not sure if i have any steps to reproduce it. As this happens when coordinator does the re-balancing.

  • Finding
    This is what we have observed in the logs for a specific segment. Let me know if complete logs needed i will try to get it.

coordinator asks a new historical to load the segment.
next it ask the same new historical to drop the segment which it just loaded because it sees that replica =2.
next it ask the older historical to drop the data, as i am assuming some callback went in saying that new node has loaded the segment so it should also drop.

Copy link

This issue has been marked as stale due to 280 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If this issue is still
relevant, please simply write any comment. Even if closed, you can still revive the
issue at any time or discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

@github-actions github-actions bot added the stale label Apr 13, 2024
Copy link

This issue has been closed due to lack of activity. If you think that
is incorrect, or the issue requires additional review, you can revive the issue at
any time.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant