Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: move ErrorWithStatus to globalerror #7790

Merged
merged 6 commits into from
Apr 26, 2024

Conversation

dimitarvdimitrov
Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov commented Apr 3, 2024

What this PR does

This PR moves ErrorWithStatus from the ingester package to a shared package. I need this change so that in #7777 the store-gateway and querier can start using the same error pattern that the distributor<>ingester requests use.

Context

ErrorWithStatus is internally used to wrap errors in the ingesters before returning them to distributors. It helps with 1) retaining the original error (in contrast with gogoproto and grpc packages which retain the error string) and with 2) attaching mimirpb error causes to responses returned over gRPC.

Necessary changes

  • Move WrapGrpcContextError from pkg/util/grpc_error.go to pkg/util/globalerror/grpc.go. This was necessary because ErrorWithStatus uses pkg/mimirpb and keeping it in pkg/util creates an import cycle
  • Rename pkg/util/globalerror/errors.go → pkg/util/globalerror/user.go. The package is named error anyways. The content of that file is user-visible errors and error codes (exceeding limits, validation, etc.). The new file contains grpc error handling.

Note to reviewers

  • the only new code is TestErrorWithStatus in pkg/util/globalerror/grpc_test.go
  • the changes in pkg/distributor/errors.go aren't strictly necessary. I took the chance to refactor that as well to reduce "similar but different" code. It changes the semantics, but shouldn't change the user-visible behaviour. The returned error still implements GRPCStatus() and error. This means the ruler should be able to use the error in the same way (code) and the grpc library should as well (code)
    • if this is a contentious point I am happy to take it out
    • removed in 7713def

Checklist

  • Tests updated.
  • [n/a] Documentation added.
  • [n/a] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • [n/a] about-versioning.md updated with experimental features.

Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! I agree with this preparatory refactoring. Thanks for the detailed PR description. I left a comment about distributor error (which I'm not sure it's the same thing you mentioned in the PR description) that I would like you to double check.

Also, can you rebase, please? ShouldLog() interface and semantic has changed.

}

func (e ErrorWithStatus) Error() string {
return e.Status.Message()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look at the code changes in distributor. toGRPCError(). I've the feeling the Error() we'll return from this function will be different. Previously it was gRPC statusError.Error() but now it's the status message (statusError.Error() decorates the status message). I'm not sure whether this is a problem or not, but if my analysis is correct I think we're masquerading a change in a refactoring PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've the feeling the Error() we'll return from this function will be different

you're right. We're now returning globalerror.ErrorWithStatus instead of status.*statusError.

But also the actual error strings will be something like connection refused - because globalerror.ErrorWithStatus returns purely the message

func (e ErrorWithStatus) Error() string {
return e.Status.Message()
}
. Whereas previously they were rpc error: code = Unavailable desc = connection refused because that's what status.*statusError constructs

i still think it's only a cosmetic change, but I can't be certain, so I'll revert that change and leave a comment if anyone feels adventurous in the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restored the previous code and added a comment in 7713def

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i still think it's only a cosmetic change, but I can't be certain, so I'll revert that change and leave a comment if anyone feels adventurous in the future

Hi @dimitarvdimitrov,
Since I was not involved in reviewing PR, I will reply you now. It is not a cosmetic change.

TL;DR

Errors of type globalerror.ErrorWithStatus are gRPC errors that could be recognized by both grpc and gogo libraries, that preserve the semantics of their underlying error. Regular gRPC errors created by both grpc and gogo libraries do not have that features. Errors produced by ingesters should preserve the underlying semantics, while errors produced by distributors don't have to preserve it (at least not in this moment).

Details

Taken from the current distributor/errors.go:

// errorWithStatus is slightly different from globalerror.NewErrorWithGRPCStatus. It returns status.Err() instead of an error with the status message.
// The actual difference is between "rpc error: code = XYZ desc = message" and just "message".
// At the time of writing this should be purely a cosmetic difference, but it's hard to verify.
// Because of this difference, the function is used instead of globalerror.NewErrorWithGRPCStatus.

This is the change you did in distributor.toGRPCError(). You basically cut one piece of the code and created a function out of it. This function simply creates a gRPC error with details by using gogo's status package: error message corresponds to the original errors's Error(), and the detail is the error cause of type mimirpb.ErrorCause. This way no information about original error's semantics is preserved. For example, if a grpc logging middleware receives this error, it cannot understand whether it was a non-loggable error, or an error that should be sampled etc. Just its error message and its details (used for a communication between different Mimir components) would be sent.

I recommend that we either revert this change or that we remove the comment which introduces confusion (it looks like we are not so sure why we do it this way).

On the other hand, we have the current globalerror.ErrorWithStatus:

// ErrorWithStatus is used for wrapping errors returned by ingester.
// Errors returned by ingester should be gRPC errors, but the errors
// produced by both gogo/status and grpc/status packages do not keep
// the semantics of the underlying error, which is sometimes needed.
// For example, the logging middleware needs to know whether an error
// should be logged, sampled or ignored. Errors of type ErrorWithStatus
// are valid gRPC errors that could be parsed by both gogo/status
// and grpc/status packages, but which preserve the original error
// semantics.

I put this comment when I implemented ErrorWithStatus within the ingester package in order to explain its main purpose, and why it was different from regular gRPC errors created by the two libraries. I think that the explanation (already present in the code) is clear. But if it is not we can rewrite it better.

Comment on lines +85 to +86
UnderlyingErr error
Status *status.Status
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do they need to be exposed?

Copy link
Contributor Author

@dimitarvdimitrov dimitarvdimitrov Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is an ingester function which uses them. The idea was to have this mostly as a data transfer object which doesn't need a constructor. The existing NewErrorWithGRPCStatus is more of a convenience function to reduce duplication

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
@dimitarvdimitrov dimitarvdimitrov force-pushed the dimitar/error-handling-move-to-global-package branch from 0e92b38 to 8cf56e7 Compare April 26, 2024 10:43
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
@dimitarvdimitrov dimitarvdimitrov marked this pull request as ready for review April 26, 2024 10:54
@dimitarvdimitrov dimitarvdimitrov requested a review from a team as a code owner April 26, 2024 10:54
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
@dimitarvdimitrov dimitarvdimitrov enabled auto-merge (squash) April 26, 2024 11:01
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
@dimitarvdimitrov
Copy link
Contributor Author

there was a faillint rule which prevents globalerror from importing any other package. That changed in this PR because we now import pkg/mimirpb. However, failling isn't flexible enough to add an exception. I also couldn't get golangci-lint with depguard to work as it just seemed to arbitrarily choose which files from the repo to scan.

so I removed the linter rule in 60ee703

@dimitarvdimitrov dimitarvdimitrov merged commit 2869d0a into main Apr 26, 2024
29 checks passed
@dimitarvdimitrov dimitarvdimitrov deleted the dimitar/error-handling-move-to-global-package branch April 26, 2024 12:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants