Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider max lag for kinesis while autoscaling #16284

Merged

Conversation

adithyachakilam
Copy link
Contributor

Description

In kinesis, lag is computed in minutes rather than count of records as done in kafka. If each shard has a lag of 1 min and there are 10 shards, we were computing that there was 10 mins of lag for auto scaling decisions, but we should be only looking at the max and make the auto scaling decisions. On the other hand, for kafka, the lag considered for autoscaling is total as previously.

Release note

For kinesis streams, autoscaling is now done on max lag per shard rather than the total lag for all shards.


Key changed/added classes in this PR
  • LagStats.java
  • LagMetric.java
  • LabBasedAutoScaler.java
  • Supervisor.java

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

long totalLags = lagStats.getTotalLag();
lagMetricsQueue.offer(totalLags > 0 ? totalLags : 0L);
long lag = lagStats.get(supervisor.getLagMetricForAutoScaler());
lagMetricsQueue.offer(lag > 0 ? lag : 0L);

Check notice

Code scanning / CodeQL

Ignored error status of call Note

Method run ignores exceptional return value of CircularFifoQueue.offer.
Copy link
Contributor

@AmatyaAvadhanula AmatyaAvadhanula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM. Thank you @adithyachakilam

@AmatyaAvadhanula AmatyaAvadhanula merged commit 34237bc into apache:master Apr 17, 2024
85 checks passed
@adithyachakilam adithyachakilam deleted the consider-max-lag-for-kinesis branch April 18, 2024 15:47
kfaraz pushed a commit that referenced this pull request Apr 20, 2024
Tries to address the comments made on #16284 after merged.

Changes:
- Remove method `Supervisor.getLagMetric()`
- Add method `Supervisor.computeLagForAutoScaler()`
- Remove classes `LagMetric` and `LagMetricTest`
adithyachakilam added a commit to adithyachakilam/druid that referenced this pull request Apr 25, 2024
@adarshsanjeev adarshsanjeev added this to the 30.0.0 milestone May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants