Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

experimental/stats: Add metrics registry #7349

Merged
merged 13 commits into from
Jul 12, 2024
Merged

Conversation

zasweq
Copy link
Contributor

@zasweq zasweq commented Jun 25, 2024

This PR adds the instrument registry for Non Per Call Metrics. It also moves Metrics symbol to external, and defined a MetricRecorder type for non per call metrics.

The next PR will be the OpenTelemetry component usage of this global registry and implementation of MetricRecorder, including the merging of default metrics by shifting default metrics to a function as per Yashes suggestion.

RELEASE NOTES: N/A

@zasweq zasweq requested a review from dfawley June 25, 2024 00:46
@zasweq zasweq added the Type: Feature New features or improvements in behavior label Jun 25, 2024
@zasweq zasweq added this to the 1.66 Release milestone Jun 25, 2024
Copy link

codecov bot commented Jun 25, 2024

Codecov Report

Attention: Patch coverage is 67.03297% with 30 lines in your changes missing coverage. Please review.

Project coverage is 81.54%. Comparing base (4dd7f55) to head (1ec4e54).
Report is 36 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7349      +/-   ##
==========================================
+ Coverage   80.58%   81.54%   +0.95%     
==========================================
  Files         349      350       +1     
  Lines       34056    26843    -7213     
==========================================
- Hits        27445    21889    -5556     
+ Misses       5431     3770    -1661     
- Partials     1180     1184       +4     
Files Coverage Δ
experimental/stats/metricregistry.go 93.44% <93.44%> (ø)
experimental/stats/metrics.go 13.33% <13.33%> (ø)

... and 302 files with indirect coverage changes

@dfawley dfawley assigned zasweq and unassigned dfawley Jun 25, 2024
@zasweq zasweq assigned dfawley and unassigned zasweq Jun 26, 2024
@zasweq
Copy link
Contributor Author

zasweq commented Jun 26, 2024

Got to offline discussion items, and moved symbols around to where we discussed.

stats/stats.go Outdated
Comment on lines 352 to 353
// Metric is an identifier for a metric.
type Metric string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we create a metrics.go for these new things?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point. Done.

stats/stats.go Outdated
Comment on lines 361 to 362
// Metrics are the set of Metrics to initialize.
Metrics map[Metric]bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be private? Maybe a ToSlice method instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I export this to read this field in OTel construction (determines whether a metric is created or not)...do you want me to expose a getter for a map and copy? That would prevent the user from doing operations on this?


import "google.golang.org/grpc/stats"

// Int64CountHandle is a typed handle for a int count instrument. This handle is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*an int - throughout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// Int64CountHandle is a typed handle for a int count instrument. This handle is
// passed at the recording point in order to know which instrument to record on.
type Int64CountHandle struct {
Index int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be exported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, both the OpenTelemetry component and the MetricsRecorderList (the next two PRs) read this index out. (In MetricsRecorderList, it uses the index to verify the labels/optionalLabels (which will log at error and eat call, not panic like Java as discussed offline). OpenTelemetry uses this index as the invariant of instrument registry that it creates metrics from and needs the Index to know which created metric to record. (In the code I have, I create noops for unregistered metrics as well, and the index will match to that and be eaten).

// DefaultMetrics are the default metrics registered through global instruments
// registry. This is written to at initialization time only, and is read
// only after initialization.
var DefaultMetrics = make(map[stats.Metric]bool)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want users to have direct access to this map? I think they just need a Register type function exposed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this offline; I thought one of the requirements you gave me was to explicitly expose this, so users that write their own stats plugins/non per call interface can have access to it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They need to read the default metrics but not directly access a set of them.

Also: why not use the Metrics set type for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok good point. I did add Merge() so yeah this should be a MetricsSet, is having the exported type be a MetricsSet good enough for your point of reading but not accessing it: "They need to read the default metrics but not directly access a set of them."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This merge will be called at runtime in OpenTelemetry for the default metrics as discussed offline.

*
*/

// Package instrumentregistry is a instrument registry that gRPC components can
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*an

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"google.golang.org/grpc/stats"
)

// InstrumentDescriptor is a data of a registered instrument (metric).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the data for"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched.

Name string
// Description is the description of this metric.
Description string
// Unit is the unit of this metric.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we delete these comments? Or make them less tautological?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah these are just for vet, is there a way to ignore vet for exported fields like this that are self explanatory?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can always improve them to make them more useful. Like give examples for units, explain what labels are used for, etc.

)

// InstrumentDescriptor is a data of a registered instrument (metric).
type InstrumentDescriptor struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't these things that third-party plugins will eventually need to work with? (I.e. should be in experimental?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh hmmmm you're right. So the only things kept internal are what a gRPC component calls into, and this will need to be read by external users in their stats plugins so they can create instruments based off it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked about this in the chat; thinking about it do I need to move the whole instrument registry into experimental actually? A user can also write their own balancer/resolver, which should be able to register instruments to eventually record on so I think that all the symbols/logic need to be external (in experimental). What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed in chat with Yash; decided to move this whole instrument registry to experimental and make externally visible.

// NOTE: this function must only be called during initialization time (i.e. in
// an init() function), and is not thread-safe. If multiple instruments are
// registered with the same name, this function will panic.
func RegisterInt64Count(name string, desc string, unit string, labels []string, optionalLabels []string, def bool) instrumentregistry.Int64CountHandle {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about having the user pass the InstrumentDescriptor directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm ok. Thinking about it that would allow scaling up fields if needed in a backward compatible way if we decide we want to persist another piece of information here. Underlying though I'll persist around a copy of the struct not the same pointer as a user.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can pass by value instead of pointer to avoid the need to manually copy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about it too now that I use a metrics set for defaults and there's a metrics type I might just have the inst desc take a Metric, and have the user pass in a Metric for explicit types and no need to typecast in this registry.

@dfawley dfawley assigned zasweq and unassigned dfawley Jul 1, 2024
@zasweq zasweq assigned dfawley and unassigned zasweq Jul 9, 2024
Copy link
Member

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we check this in without any code using it, then we should probably be prepared to make a lot of changes here.

package stats

import (
"log"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use "log"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah whoops I had this for panic formatting. Switched to logger.Fatalf().

Name stats.Metric
// Description is the description of this metric.
Description string
// Unit is the unit of this metric (e.g. entries, milliseconds).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't OTel opposed to using "milliseconds" and instead prefer to use "seconds" for most/all things time related?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah I thought the A66 metrics were /milliseconds, but they are defined as /s (https://github.com/grpc/proposal/blob/master/A66-otel-stats.md#client-per-attempt-instruments) which is why we had those bucket counts.


// MetricDescriptor is the data for a registered metric.
type MetricDescriptor struct {
// Name is the name of this metric. This name must be unique across whole
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstrings on fields don't follow this convention like types/functions. You can just say "The name of the metric". Throughout. Maybe that's why it seems so redundant to me "Name is the name". Just say "The name".

https://google.github.io/styleguide/go/decisions#doc-comments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'll just delete these comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually switched comments for field to // The Field...

// is passed at the recording point in order to know which metric to record
// on.
type Int64CountHandle struct {
MetricDescriptor *MetricDescriptor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unexport this field throughout? Or maybe type Int64CountHandle MetricDescriptor? Or type Int64CountHandle *MetricDescriptor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm ok so we could do an alias too? I needed this exported for OpenTelemetry to read the pointer to look into it's data structures it builds out for metrics. (I guess the metrics recorder list could live here for the assertion on labels/optional labels length and access unexported fields).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I make it a pointer alias; I can't define functions on it. I tried making it a alias to metricDescriptor but the problem is if I typecast the descriptor into a xxxHandle it changes address. How do we keep it pointers so it can link alongside calling methods on handle returned?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can point to a playground link illustrating the things you tried? The code as-is is still a struct so I can't see what you did.

}

// Record records the int64 count value on the metrics recorder provided.
func (h Int64CountHandle) Record(recorder MetricsRecorder, incr int64, labels ...string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these just be passed around by pointer everywhere? Why not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure we could. This is a copy of a that x bit pointer but pointer works too.

// Package stats contains experimental metrics/stats API's.
package stats

// MetricsRecorder records on metrics derived from instrument registry.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't seem to use "instrument" anymore right?

Also I think this should be in "metrics.go" or "metricregistry.go"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah whoops missed this file (I went on a crusade to switch all instances in other files). Switched.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave this here for now until we wrap up discussion on whether metrics.go should be experimental or not.

type MetricsRecorder interface {
// RecordIntCount records the measurement alongside labels on the int count
// associated with the provided handle.
RecordIntCount(Int64CountHandle, int64, ...string)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice to name your parameters as a documentation aide.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah I was wondering this too. Named the parameters handle, incr, and labels.

stats/metrics.go Outdated
import "maps"

// Metric is an identifier for a metric.
type Metric string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want all these to be in experimental for now, too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stats package is entirely experimental as per top level package comment. I think it's fine to leave this here, since configures unstable OTel which lives in this package too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're going with that philosophy then why have experimental/stats for the other stuff? Let's keep it together.

since configures unstable OTel which lives in this package too

OTel should not have anything in the stats package, right?

Copy link
Contributor Author

@zasweq zasweq Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K will move. I guess now I can move the MetricsRecorder interface to this experimental/stats/metrics.go file as well.

stats/metrics.go Outdated
// Do not construct directly; use NewMetrics instead.
type Metrics struct {
// Metrics are the set of Metrics to initialize.
Metrics map[Metric]bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should hide fields to prevent direct accesses, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field is read at OTel creation time. I could add a to slice function which would solve the function body comment below, but then it wouldn't be constant accesses at otel creation time when determining whether to build a metric or not. Let me see what looks best.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a getter for this an unexported so we can use that map to check for presence in constant time.

stats/metrics.go Outdated

// Join joins the metrics passed in with the metrics set, and returns a new copy
// with the merged metrics.
func (m *Metrics) Join(metrics *Metrics) *Metrics { // Use this API...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this function body be simplified to m.Add(m.Values()/*or whatever*/)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left it as is since I added a getter for map only not to convert to slice.

@dfawley dfawley assigned zasweq and unassigned dfawley Jul 10, 2024
@dfawley dfawley changed the title internal/instrumentregistry: Add instrument registry stats: Add metrics registry Jul 11, 2024

// MetricDescriptor is the data for a registered metric.
type MetricDescriptor struct {
// The name. This name must be unique across whole binary (including any per
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The name...of this metric"?

And "across the whole binary" reads a bit better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done for both.

// DefaultMetrics are the default metrics registered through global metrics
// registry. This is written to at initialization time only, and is read only
// after initialization.
var DefaultMetrics = stats.NewMetrics() // loop through this set,, export metrisdescriptor passed upward for pointer and labels/optional lables...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete comment?

Copy link
Contributor Author

@zasweq zasweq Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry missed this and a few others.

// is passed at the recording point in order to know which metric to record
// on.
type Int64CountHandle struct {
MetricDescriptor *MetricDescriptor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can point to a playground link illustrating the things you tried? The code as-is is still a struct so I can't see what you did.

// GetMetric returns the MetricDescriptor from the global registry.
//
// Returns nil if MetricDescriptor not present.
func GetMetric(metric stats.Metric) *MetricDescriptor {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetMetricDescriptor maybe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or DescriptorForMetric?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chose DescriptorForMetric.

@zasweq
Copy link
Contributor Author

zasweq commented Jul 11, 2024

Thanks for your help; got to all comments and fixed tests.

@zasweq zasweq assigned dfawley and unassigned zasweq Jul 11, 2024
Comment on lines +52 to +53
// Whether this metric is on by default.
Default bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to think about for later when you're implementing the things that actually use all of this: should we remove this field from the descriptor since it (I think?) only affects the initial registration and whether to put it into the DefaultMetrics set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I've thought about this persistence before, I wanted to delete this since this information is encoded in DefaultSet, but this is how the user registers through API now (since we persist a copy of what user sets). Do you see a way to delete the field but keep the API?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two obvious other options:

  1. RegisterFooMetric() and RegisterDefaultFooMetric()
  2. RegisterFooMetric(md MetricDescriptor, default bool)

I think this is okay, too, but we should consider the above since this data is effectively discarded immediately after registration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather do 2. Let's keep this in mind for the future since I'm sure this metrics registry will iterate organically as we merge more layers.

Comment on lines 62 to 66
type Int64CountHandle MetricDescriptor

// Record records the int64 count value on the metrics recorder provided.
func (h *Int64CountHandle) Record(recorder MetricsRecorder, incr int64, labels ...string) {
recorder.RecordIntCount(h, incr, labels...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does Int64CountHandle have a size on it but the RecordIntCount method does not? It seems like either both should say 64, or neither should.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah good point. I'd prefer leaving the 64 bit identifier as OpenTelemetry and Java/C specify 64. This is because OpenTelemetry has both int/float(32||64) instruments so if we scale this registry up in the future it more aligns with OpenTelemetry.

// NOTE: this function must only be called during initialization time (i.e. in
// an init() function), and is not thread-safe. If multiple metrics are
// registered with the same name, this function will panic.
func RegisterInt64Count(descriptor MetricDescriptor) *Int64CountHandle {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make sure the "numbers vs. no numbers" matches here too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above comment; chose numbers route.

Comment on lines 201 to 210
// MetricType is the type of metric.
type MetricType int

const (
MetricTypeIntCount MetricType = iota
MetricTypeFloatCount
MetricTypeIntHisto
MetricTypeFloatHisto
MetricTypeIntGauge
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this should maybe move to immediately before or after the MetricDescriptor type definition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Chose after since Type MetricType field is at the last field.

}
RegisterInt64Count(desc)
RegisterInt64Gauge(desc)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't fail if a panic doesn't occur.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recover() returns nil if the test is not panicing. Then I typecast to string, and that won't contain the want right? Will my typecast of a nil to a string error?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, duh, the defer will still run, sorry!

}
}

// Metrics returns the metrics set.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

State that the returned map should be read-only and must not be modified, please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. It's weird since it says metrics set and returned map but I think those are aliases of each other, and functionally this does return a map, which is treated as a set.

@dfawley dfawley assigned zasweq and unassigned dfawley Jul 12, 2024
@dfawley
Copy link
Member

dfawley commented Jul 12, 2024

This should be good to go modulo the small nits/test issue. Let's be ready to iterate on the registry stuff if you find it isn't working well while implementing the things that use it.

@zasweq zasweq changed the title stats: Add metrics registry experimental/stats: Add metrics registry Jul 12, 2024
@zasweq zasweq merged commit ecbb837 into grpc:master Jul 12, 2024
13 checks passed
printchard pushed a commit to printchard/grpc-go that referenced this pull request Jul 30, 2024
printchard pushed a commit to printchard/grpc-go that referenced this pull request Jul 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature New features or improvements in behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants