Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ssa: Improve wait error reporting #722

Merged
merged 1 commit into from
Jan 23, 2024
Merged

Conversation

stefanprodan
Copy link
Member

@stefanprodan stefanprodan commented Jan 19, 2024

If the context deadline has not been reached, include only the failed resources in the returned error. This makes the health check error message concise, instead of listing potentially hundreds of resources in progress, we only list the ones that actually failed.

Part of: fluxcd/flux2#4556
Fixes: #715
Supersedes: #717

@stefanprodan stefanprodan added the area/server-side-apply SSA related issues and pull requests label Jan 19, 2024
If fail fast is enable, include only the failed resources
in the returned error.

Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
if rs == nil {
errors = append(errors, fmt.Sprintf("can't determine status for %s", utils.FmtObjMetadata(id)))
continue
var errs []string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdyt think about

 var errs []error
	for id, rs := range statusCollector.ResourceStatuses {
		switch {
		case rs == nil || lastStatus[id] == nil:
			errs = append(errs, fmt.Errorf("resource %s not found", utils.FmtObjMetadata(id)))
		case lastStatus[id].Status == status.FailedStatus:
			var builder strings.Builder
			builder.WriteString(fmt.Sprintf("%s status: '%s'",
				utils.FmtObjMetadata(rs.Identifier), lastStatus[id].Status))
			if rs.Error != nil {
				builder.WriteString(fmt.Sprintf(": %s", rs.Error))
			}
			errs = append(errs, errors.New(builder.String()))
...
	if len(errs) > 0 {
		msg := "failed early due to stalled resources"
		if errors.Is(ctx.Err(), context.DeadlineExceeded) {
			msg = "timeout waiting for"
		}
		return fmt.Errorf("%s: %w", msg, errors.Join(errs...))
	}

Wrapping the errors can make it easier to inspect for calling funtions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more about humans than code, keep in mind that the resulting error ends up in Slack, PagerDuty, etc. Is error wrapping with 100+ errors easier to read and parse than an array?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.Error() will return a string still. The difference is that for Join. It says The error formats as the concatenation of the strings obtained by calling the Error method of each element of errs, with a newline between each string.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok but this will change the format and if people rely on the current array, their parser will fail. I would consider change it in Flux minor release instead of a patch like now.

Copy link
Contributor

@darkowlzz darkowlzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Left a suggestion about simplification of duplicate code in the switch cases. Can be ignored or accepted.

Comment on lines +129 to +142
case lastStatus[id].Status == status.FailedStatus:
var builder strings.Builder
builder.WriteString(fmt.Sprintf("%s status: '%s'",
utils.FmtObjMetadata(rs.Identifier), lastStatus[id].Status))
if rs.Error != nil {
builder.WriteString(fmt.Sprintf(": %s", rs.Error))
}
if lastStatus[id] == nil {
// this is only nil in the rare case where no status can be determined for the resource at all
errors = append(errors, fmt.Sprintf("%s (unknown status)", utils.FmtObjMetadata(rs.Identifier)))
} else if lastStatus[id].Status != status.CurrentStatus {
var builder strings.Builder
builder.WriteString(fmt.Sprintf("%s status: '%s'",
utils.FmtObjMetadata(rs.Identifier), lastStatus[id].Status))
if rs.Error != nil {
builder.WriteString(fmt.Sprintf(": %s", rs.Error))
}
errors = append(errors, builder.String())
errs = append(errs, builder.String())
case errors.Is(ctx.Err(), context.DeadlineExceeded) && lastStatus[id].Status != status.CurrentStatus:
var builder strings.Builder
builder.WriteString(fmt.Sprintf("%s status: '%s'",
utils.FmtObjMetadata(rs.Identifier), lastStatus[id].Status))
if rs.Error != nil {
builder.WriteString(fmt.Sprintf(": %s", rs.Error))
Copy link
Contributor

@darkowlzz darkowlzz Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to ignore it but the two cases above seem identical.
Maybe there are two cases to keep the conditionals simple but maybe it'd be simpler to maintain in the long run if the duplicate code can be put together in one case and the case expression can be made a little complex like:

case (lastStatus[id].Status == status.FailedStatus) || (errors.Is(ctx.Err(), context.DeadlineExceeded) && lastStatus[id].Status != status.CurrentStatus):
        ...

I may be missing something if they aren't identical. Please tell me if they aren't.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did like this so it's easier to read, failure vs timeout.

@stefanprodan stefanprodan merged commit 16f5d11 into main Jan 23, 2024
13 checks passed
@stefanprodan stefanprodan deleted the ssa-filter-wait-errors branch January 23, 2024 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/server-side-apply SSA related issues and pull requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve kstatus output for failed early due to stalled resources
3 participants