Add HorizontalPodAutoscaler for fluentd #339

vsinghal13 · 2019-12-12T17:45:18Z

Description

This PR add the capability to autoscale fluentd based on CPU usage. Metrics Server is used to expose the metrics API . HPA uses the API to read the metrics and make a decision to autoscale.

Below is a graph to show the autoscaler behavior.

Two tests scenarios:

Gradual increase in load: The load was increased in increments of 2 from 0 to 6 and then dropped to 2 and eventually 0.
Immediate spike: Load goes from 0 to 10 in a short span of time.

In the first scenario, the autoscaler scales the number of replicas in multiples of 2 i.e. 1,2,4,8.
When the the load is dropped down to 0, autoscaler gradually drops with a much smaller step size and stabilizes at 3.

In the second scenario, the autoscaler again scales from 3 to 6 and then 10.

Default value set to be "true".

Testing performed

ci/build.sh
Redeploy fluentd and fluentd-events pods
Confirm events, logs, and metrics are coming in

…Logic/sumologic-kubernetes-collection into vsinghal-fluentd-autoscaler

rvmiller89 · 2019-12-13T19:26:59Z

deploy/helm/sumologic/templates/hpa.yaml

@@ -0,0 +1,17 @@
+{{- if and .Values.sumologic.fluentd.autoscaling.enabled}}


what's the reason for and here? We're not checking two conditions.

Good catch! Will fix it.

rvmiller89 · 2019-12-13T19:27:26Z

deploy/helm/sumologic/values.yaml

+    ## Option to turn autoscaling on for fluentd and specify metrics for HPA.
+    autoscaling:
+      enabled: true
+      minReplicas: 1


do we want to set minReplicas to 3 to match our current 3 replicas?

I was wondering if that's necessary? I was thinking that we keep the min to be 1 and let the autoscaler do its job.

Yeah, it's worth a discussion. Should get @frankreno 's opinion as well. Would 1 replica no longer be considered high availability (HA) since there could be data loss until the next replica is brought up?

I would suggest we have a min of 2 or 3 at least to prevent any loss of data while its spinning up or if it did scale down to 1 and the node died, it would have a fall back. This can be our default and we just expose it they values.yaml so customers can change if needed.

ok will change it to 3 then.

rvmiller89 · 2019-12-13T19:28:28Z

deploy/helm/sumologic/values.yaml

+
+## Configure metrics-server
+## ref: https://github.com/helm/charts/blob/master/stable/metrics-server/values.yaml
+metrics-server:


do we need to teach our CI to generate metrics-server-overrides.yaml for the non-Helm installation?

I guess it will be required if we want to add the arguments.

rvmiller89 · 2019-12-13T19:29:01Z

deploy/helm/sumologic/values.yaml

+metrics-server:
+  enabled: true
+  args:
+    - --kubelet-insecure-tls


I could see customers having an issue with using this in an insecure way. Is there any other way to run this?

Without using that the metrics server is unable to get metrics. Will look into it more if we can avoid this.
kubernetes-sigs/metrics-server#131

…Logic/sumologic-kubernetes-collection into vsinghal-fluentd-autoscaler

samjsong

LGTM, but this morning Gourav mentioned that in his experience Fluentd could sometimes drop records when it scaled down. Did you notice anything like that during your testing of the autoscaler?

…Logic/sumologic-kubernetes-collection into vsinghal-fluentd-autoscaler

prateekladha · 2019-12-18T14:25:23Z

deploy/helm/sumologic/values.yaml

@@ -164,6 +164,12 @@ sumologic:
  fluentd:
    ## Option to specify the Fluentd buffer as file/memory.
    buffer: "memory"
+    ## Option to turn autoscaling on for fluentd and specify metrics for HPA.
+    autoscaling:
+      enabled: false


@vsinghal13 will autoscaling be OFF by default?

yes, by default it will be off as of now.

vsinghal13 and others added 17 commits December 11, 2019 14:11

add prometheus adapter helm chart dependency

7f119fe

add autoscaling config in values.yaml

169df02

add template for hpa.yaml

56d4fff

fix indentation in requirements.yaml

c2f5225

Generate new 'fluentd-sumologic.yaml.tmpl'

189f678

change HPA config to use custom metric

f63f2bc

Merge branch 'vsinghal-fluentd-autoscaler' of https://github.com/Sumo…

80ea24d

…Logic/sumologic-kubernetes-collection into vsinghal-fluentd-autoscaler

Generate new 'fluentd-sumologic.yaml.tmpl'

0dc2207

Generate new overrides yaml/libsonnet file(s).

0eee516

change apiVersion

a7c3da0

use CPU as the metrics for HPA

c95fa2a

Merge branch 'vsinghal-fluentd-autoscaler' of https://github.com/Sumo…

0c29e1d

…Logic/sumologic-kubernetes-collection into vsinghal-fluentd-autoscaler

Generate new 'fluentd-sumologic.yaml.tmpl'

61f1714

Generate new overrides yaml/libsonnet file(s).

773ac58

simplify hpa.yaml

2f12518

Generate new 'fluentd-sumologic.yaml.tmpl'

be04863

Generate new overrides yaml/libsonnet file(s).

6f9e198

vsinghal13 requested review from rvmiller89, maimaisie and samjsong December 13, 2019 07:59

maimaisie approved these changes Dec 13, 2019

View reviewed changes

rvmiller89 reviewed Dec 13, 2019

View reviewed changes

vsinghal13 added 2 commits December 13, 2019 13:34

address PR comments

f6d419c

Merge branch 'vsinghal-fluentd-autoscaler' of https://github.com/Sumo…

e43cae5

…Logic/sumologic-kubernetes-collection into vsinghal-fluentd-autoscaler

samjsong mentioned this pull request Dec 13, 2019

Add support for data persistence #337

Closed

5 tasks

vsinghal13 added 2 commits December 17, 2019 11:46

default the autoscaler config to false

75a695c

Change CI

6fde744

vsinghal13 force-pushed the vsinghal-fluentd-autoscaler branch from 2eaecf8 to 6fde744 Compare December 17, 2019 21:49

Generate new 'fluentd-sumologic.yaml.tmpl'

4b4b6a6

rvmiller89 approved these changes Dec 17, 2019

View reviewed changes

samjsong approved these changes Dec 17, 2019

View reviewed changes

vsinghal13 added 3 commits December 17, 2019 14:38

add git diff command for metrics-server-overrides.yaml

030126d

Merge branch 'vsinghal-fluentd-autoscaler' of https://github.com/Sumo…

12c5c71

…Logic/sumologic-kubernetes-collection into vsinghal-fluentd-autoscaler

add metrics-server-overrides.yaml

8903708

prateekladha reviewed Dec 18, 2019

View reviewed changes

vsinghal13 merged commit 7fcb065 into master Dec 18, 2019

vsinghal13 deleted the vsinghal-fluentd-autoscaler branch December 18, 2019 17:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add HorizontalPodAutoscaler for fluentd #339

Add HorizontalPodAutoscaler for fluentd #339

vsinghal13 commented Dec 12, 2019 •

edited

Loading

rvmiller89 Dec 13, 2019

vsinghal13 Dec 13, 2019

rvmiller89 Dec 13, 2019

vsinghal13 Dec 13, 2019 •

edited

Loading

rvmiller89 Dec 13, 2019

frankreno Dec 13, 2019

vsinghal13 Dec 13, 2019

rvmiller89 Dec 13, 2019

vsinghal13 Dec 13, 2019

rvmiller89 Dec 13, 2019

vsinghal13 Dec 13, 2019

samjsong left a comment

prateekladha Dec 18, 2019

vsinghal13 Dec 18, 2019

		@@ -0,0 +1,17 @@
		{{- if and .Values.sumologic.fluentd.autoscaling.enabled}}

Add HorizontalPodAutoscaler for fluentd #339

Add HorizontalPodAutoscaler for fluentd #339

Conversation

vsinghal13 commented Dec 12, 2019 • edited Loading

Description

Testing performed

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vsinghal13 Dec 13, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samjsong left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vsinghal13 commented Dec 12, 2019 •

edited

Loading

vsinghal13 Dec 13, 2019 •

edited

Loading