Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics: add more metrics for golang gc #32094

Closed
wants to merge 7 commits into from

Conversation

hawkingrei
Copy link
Member

@hawkingrei hawkingrei commented Feb 2, 2022

Signed-off-by: Weizhen Wang wangweizhen@pingcap.com

What problem does this PR solve?

Issue Number: ref #32090

Problem Summary:

What is changed and how it works?

Intervals between garbage collections

useful to know if we can still tune. For instance, Go forces a garbage collection every 2 minutes. If your service is still having high GC impact, but you already see 120s for this graph, it means that you can no longer tune using GOGC. In this case you would need to optimize your allocations.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
> $ curl http://127.0.0.1:10080/metrics |grep golang_gc

# HELP tidb_system_golang_gc_intervals_ns intervals between GC
# TYPE tidb_system_golang_gc_intervals_ns gauge
tidb_system_golang_gc_intervals_ns 1.32817e+08
# HELP tidb_system_golang_gc_pause_ns Current GC pause percentage
# TYPE tidb_system_golang_gc_pause_ns gauge
tidb_system_golang_gc_pause_ns 0.0005715342772578194

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
@ti-chi-bot
Copy link
Member

ti-chi-bot commented Feb 2, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • jackysp

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 2, 2022
Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
@hawkingrei hawkingrei closed this Feb 2, 2022
@hawkingrei hawkingrei reopened this Feb 2, 2022
Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
@hawkingrei hawkingrei changed the title metrics: add metrics for golang gc metrics: add more metrics for golang gc Feb 2, 2022
Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
@sre-bot
Copy link
Contributor

sre-bot commented Feb 2, 2022

util/gogc.go Outdated Show resolved Hide resolved
util/gogc.go Outdated Show resolved Hide resolved
Copy link
Member

@jackysp jackysp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REST LGTM, but where are the unit test cases?

Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
@hawkingrei
Copy link
Member Author

REST LGTM, but where are the unit test cases?

it is difficult for this feature to add the unit tests. so I add the manual test to the PR description.

Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Feb 7, 2022
@hawkingrei
Copy link
Member Author

/run-unit-test

Copy link
Contributor

@tiancaiamao tiancaiamao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prometheus client has already register the Go GC metrics,
and (when displayed in grafana) they are much more sensible than a guage value like GCPausePercent and GCIntervals
https://povilasv.me/prometheus-go-metrics/

debug.ReadGCStats(&gc)
now := time.Now().UnixNano()
dur := float64(now - gss.last.now)
gcPauseRatio := float64(uint64(gc.PauseTotal)-gss.last.gcPauseTotal) / dur
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to know GCPausePercent as a guage metric, you don't need to record the last.gcPauseTotal
Just use: time.Since(program start) / gc.PauseTotal

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is copied from CRDB https://github.com/cockroachdb/cockroach/blob/e5eda9f716a6e1a30f17adaf96de953187030ff8/pkg/server/status/runtime.go#L523. I think this is to show the rate of change of GC duration.

if gss.last.lastGC.Before(gc.LastGC) {
gcIntervals := gc.LastGC.Sub(gss.last.lastGC)
gss.last.lastGC = gc.LastGC
metrics.GCIntervals.Set(float64(gcIntervals.Nanoseconds()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to know when GC happens, it's already there in the grafana...
And from the image the GCIntervals are obvious

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I will remove it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GC durations? I think it is different with GC intervals.


// Start is used to start the sampler.
func (gss *GOGCStatSampler) Start() {
ticker := time.NewTicker(100 * time.Millisecond)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debug.ReadGCStats need to hold the heap lock, so the less frequent of the calling, the better.

@hawkingrei hawkingrei added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 8, 2022
@ti-chi-bot
Copy link
Member

@hawkingrei: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ti-chi-bot ti-chi-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 12, 2022
@hawkingrei hawkingrei closed this Dec 23, 2022
@hawkingrei hawkingrei deleted the add_gogc_metrics branch December 23, 2022 05:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. status/LGT1 Indicates that a PR has LGTM 1.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants