-
Notifications
You must be signed in to change notification settings - Fork 524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OTLP: Add metrics to track otlp request samples per batch #8265
Conversation
113aa1c
to
cfef061
Compare
1c3f81f
to
78f00a2
Compare
pkg/distributor/distributor.go
Outdated
@@ -281,6 +282,11 @@ func newPushMetrics(reg prometheus.Registerer) *PushMetrics { | |||
NativeHistogramMinResetDuration: 1 * time.Hour, | |||
NativeHistogramMaxBucketNumber: 100, | |||
}, []string{"user"}), | |||
otlpIncomingSamplesPerBatch: promauto.With(reg).NewHistogramVec(prometheus.HistogramOpts{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this will create a lot of series (number of tenants using OTLP * number of distributors * (number of buckets + 2)). It's around 700k series altogether in our production environments. If we used native histograms only, it would be 1/10th of that.
If we go with native histograms, I'd recommend using factor=2, 4 or 16 (buckets would be have boundaries that are exponents of the factor).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after discussed offline with Peter, I am going to change factor to 2 and max bucket to 100.
78f00a2
to
b3317e5
Compare
b3317e5
to
a63fb85
Compare
I'd like to suggest we track these numbers for all sources together (remote-write and OTLP), since it is equally interesting for remote-write, and the common case is that each customer picks just one method. Also it would be good to have the number of exemplars tracked similarly. |
16cbd34
to
9e0cafc
Compare
9e0cafc
to
3bbfff2
Compare
Co-authored-by: Peter Štibraný <pstibrany@gmail.com>
Signed-off-by: Ying WANG <ying.wang@grafana.com>
What this PR does
This PR intent to address #6935 (comment), we need to use the same limit unit for otel collector and mimir, it should be number of samples per batch and aligned with send_batch_max_size. Before doing so, we need better understanding what are the batch size now from users.
The added bucket is based on our recommendation 8192
Note: there is no fancy way to test native histogram metrics exported today.
Which issue(s) this PR fixes or relates to
Fixes: #8269
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.