-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add HorizontalPodAutoscaler for fluentd #339
Conversation
…Logic/sumologic-kubernetes-collection into vsinghal-fluentd-autoscaler
…Logic/sumologic-kubernetes-collection into vsinghal-fluentd-autoscaler
@@ -0,0 +1,17 @@ | |||
{{- if and .Values.sumologic.fluentd.autoscaling.enabled}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the reason for and
here? We're not checking two conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Will fix it.
deploy/helm/sumologic/values.yaml
Outdated
## Option to turn autoscaling on for fluentd and specify metrics for HPA. | ||
autoscaling: | ||
enabled: true | ||
minReplicas: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to set minReplicas
to 3
to match our current 3 replicas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if that's necessary? I was thinking that we keep the min to be 1 and let the autoscaler do its job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's worth a discussion. Should get @frankreno 's opinion as well. Would 1 replica no longer be considered high availability (HA) since there could be data loss until the next replica is brought up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest we have a min of 2 or 3 at least to prevent any loss of data while its spinning up or if it did scale down to 1 and the node died, it would have a fall back. This can be our default and we just expose it they values.yaml so customers can change if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok will change it to 3 then.
|
||
## Configure metrics-server | ||
## ref: https://github.com/helm/charts/blob/master/stable/metrics-server/values.yaml | ||
metrics-server: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to teach our CI to generate metrics-server-overrides.yaml
for the non-Helm installation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it will be required if we want to add the arguments.
metrics-server: | ||
enabled: true | ||
args: | ||
- --kubelet-insecure-tls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could see customers having an issue with using this in an insecure way. Is there any other way to run this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without using that the metrics server is unable to get metrics. Will look into it more if we can avoid this.
kubernetes-sigs/metrics-server#131
…Logic/sumologic-kubernetes-collection into vsinghal-fluentd-autoscaler
2eaecf8
to
6fde744
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but this morning Gourav mentioned that in his experience Fluentd could sometimes drop records when it scaled down. Did you notice anything like that during your testing of the autoscaler?
@@ -164,6 +164,12 @@ sumologic: | |||
fluentd: | |||
## Option to specify the Fluentd buffer as file/memory. | |||
buffer: "memory" | |||
## Option to turn autoscaling on for fluentd and specify metrics for HPA. | |||
autoscaling: | |||
enabled: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vsinghal13 will autoscaling be OFF by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, by default it will be off as of now.
Description
This PR add the capability to autoscale fluentd based on CPU usage. Metrics Server is used to expose the metrics API . HPA uses the API to read the metrics and make a decision to autoscale.
Below is a graph to show the autoscaler behavior.
Two tests scenarios:
In the first scenario, the autoscaler scales the number of replicas in multiples of 2 i.e. 1,2,4,8.
When the the load is dropped down to 0, autoscaler gradually drops with a much smaller step size and stabilizes at 3.
In the second scenario, the autoscaler again scales from 3 to 6 and then 10.
Default value set to be "true".
Testing performed