-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[full ci] Improve exec error message for a CVM shutdown mid operation #6593
Conversation
TIL that the docker client calls multiple personality endpoints. So there will still be a concurrent modification error when we report the shutdown on the race condition edge. so now the error message looks like this
|
Makefile
Outdated
@@ -235,13 +235,11 @@ misspell: $(MISSPELL) | |||
|
|||
govet: | |||
@echo checking go vet... | |||
@$(GO) tool vet -all -lostcancel -tests $$(find . -mindepth 1 -maxdepth 1 -type d -not -name vendor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we are removing vet and gas checks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not :( I had that removed since mine misbehaves and it snuck in... this happens to me occasionally as I get add
happy with git.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, then keep it in your local directory. Why did you do a git add and pushed it as a part of this PR?
@@ -257,7 +258,8 @@ func (c *Container) TaskWaitToStart(cid, cname, eid string) error { | |||
|
|||
// ContainerExecCreate sets up an exec in a running container. | |||
func (c *Container) ContainerExecCreate(name string, config *types.ExecConfig) (string, error) { | |||
defer trace.End(trace.Begin(name)) | |||
op := trace.NewOperation(context.TODO(), "") | |||
defer trace.End(trace.Begin(fmt.Sprintf("opid=(%s) name=(%s)", op.ID(), name))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a Stringer to the op so %s ends up with ID()? (if it is missing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
like a op.Sprintf
? :) yes I would love to do that! I think I had a PR that never got merged awhile back that had one.
@@ -271,15 +273,19 @@ func (c *Container) ContainerExecCreate(name string, config *types.ExecConfig) ( | |||
if err != nil { | |||
return "", InternalServerError(err.Error()) | |||
} | |||
|
|||
// This does not appear to be working... we may need to do this check further down the stack. it is going to be very race filled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a fixme here. Also not sure what this comment is trying to tell :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah that was just an investigative comment :) I thought it was removed sorry :P I was thinking that it was an early check on state that was far from the source of truth. But I figured it is more of a soft check anyways in case the user tried to exec an already off container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going to remove this completely. It is a good check to have and we can always handle things down stream as well.
handle, err := c.Handle(id, name) | ||
if err != nil { | ||
log.Error(err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
op.Errorf?
lib/portlayer/exec/commit.go
Outdated
} | ||
|
||
err := retry.Do(retryFunc, isIntermittentFailure) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra space
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
op.Sprintf -> fmt.Sprintf one is the blocker as I don't think it compiles with it
@@ -222,6 +222,7 @@ func (c *Container) TaskInspect(cid, cname, eid string) (*models.TaskInspectResp | |||
if err != nil { | |||
return nil, err | |||
} | |||
|
|||
return resp.Payload, nil | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[minor] extra line :P
@@ -257,7 +258,8 @@ func (c *Container) TaskWaitToStart(cid, cname, eid string) error { | |||
|
|||
// ContainerExecCreate sets up an exec in a running container. | |||
func (c *Container) ContainerExecCreate(name string, config *types.ExecConfig) (string, error) { | |||
defer trace.End(trace.Begin(name)) | |||
op := trace.NewOperation(context.TODO(), "") | |||
defer trace.End(trace.Begin(op.Sprintf("name=(%s)", name))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
op.Sprintf -> fmt.Sprintf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the op.Sprintf since i thought that was the stringer
that you wanted :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I just saw the below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm 👍 I can make that change, I see what you meant by stringer now, I still like the Sprintf though unless there is an egregious reason not to have it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I don't think having a Sprintf on op makes sense. fmt already provides that and that's the standart way of doing that.
Others can chime in if they think this way or that way :)
@@ -381,7 +407,8 @@ func (c *Container) ContainerExecResize(eid string, height, width int) error { | |||
// ContainerExecStart starts a previously set up exec instance. The | |||
// std streams are set up. | |||
func (c *Container) ContainerExecStart(ctx context.Context, eid string, stdin io.ReadCloser, stdout io.Writer, stderr io.Writer) error { | |||
defer trace.End(trace.Begin(eid)) | |||
op := trace.NewOperation(ctx, "") | |||
defer trace.End(trace.Begin(fmt.Sprintf("opID=(%s) eid=(%s)", op.ID(), eid))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can add a Stringer to op and use op here instead of op.ID()
func (o *Operation) String() string {
return o.id
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. So that it has the same behavior as error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also changing this to using the op.Sprintf since it will tack the header on by default. I wanted to promote the use of op.Sprintf for providing strings to the Trace.Begin
and Trace.End
calls since I still love those and it offers a much cleaner way of adding the Operation header to that logging.
@@ -257,7 +258,8 @@ func (c *Container) TaskWaitToStart(cid, cname, eid string) error { | |||
|
|||
// ContainerExecCreate sets up an exec in a running container. | |||
func (c *Container) ContainerExecCreate(name string, config *types.ExecConfig) (string, error) { | |||
defer trace.End(trace.Begin(name)) | |||
op := trace.NewOperation(context.TODO(), "") | |||
defer trace.End(trace.Begin(op.Sprintf("name=(%s)", name))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I don't think having a Sprintf on op makes sense. fmt already provides that and that's the standart way of doing that.
Others can chime in if they think this way or that way :)
From my perspective I wouldn't add |
You got it @cgtexmex |
You have added the creation of an Operation at the beginning of most (if not all) the exec funcs -- please use the |
21f9437
to
eaba45f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, just a few minor suggestions.
@@ -16,7 +16,6 @@ | |||
Documentation Test 1-38 - Docker Exec | |||
Resource ../../resources/Util.robot | |||
Suite Setup Install VIC Appliance To Test Server | |||
Suite Teardown Cleanup VIC Appliance On Test Server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put that back in
@@ -58,7 +57,7 @@ Exec Echo -t | |||
${rc} ${output}= Run And Return Rc And Output docker %{VCH-PARAMS} pull ${busybox} | |||
Should Be Equal As Integers ${rc} 0 | |||
Should Not Contain ${output} Error | |||
${rc} ${id}= Run And Return Rc And Output docker %{VCH-PARAMS} run -d ${busybox} /bin/top -d 600 | |||
${rc} ${id}= Run And Return Rc And Output docker %{VCH-PARAMS} run -td ${busybox} /bin/top -d 600 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't add -t as this isn't run with a TTY
${rc} ${output}= Run And Return Rc And Output docker %{VCH-PARAMS} pull ${busybox} | ||
Should Be Equal As Integers ${rc} 0 | ||
Should Not Contain ${output} Error | ||
${rc} ${id}= Run And Return Rc And Output docker %{VCH-PARAMS} run -d ${busybox} /bin/top -d 600 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't think you need -d 600 here.. no reason to create a race here, just let it run
9b0c8a6
to
653f412
Compare
This is specifically adding a retry from our retry package which will retry the config reload when we a concurrent access fault occurs. We also refresh the container config during the retry in the event that we need the new ChangeVersion. This is just the initial test, at first glance it does not look like it is enough to resolve the issue. I will also be adding a test which should reliably reproduce the error scenario.
653f412
to
5413c06
Compare
This will add a retry for intermittent failures for a guest config reload. As well as propagate a sane and friendly error back to the user if their exec operation was interrupted by the container shutting down. Some more work will need to be done for fixing the concurrent modification error that is returned via stdout when we attach the streams.
This adds some error types and propagates them back from the portlayer to the personality. TaskInspect now returns a
ConflictError
if the supplied ID does not exist and we see that the state of the cvm is powered off. This error case may be too broad, but in TaskInspect we are not sure if the ID supplied was meant to be an exec ID or a session task ID. This means the solution is best guess, if the id is not found and we are off it is assumed that the operation has been interupted. I thought about just doing the state check again on the personality side of things, however the unknown task ID issue is already triggered after making it past the original state check in CreateExecTask, so I deemed that the task inspect check might need to be closer to the SoT(feel free to correct me on this as well).Additionally, the guest reload config function has been wrapped in a retry that retries on transient errors. We may want to narrow down the potential retry case as I used the waiter.go intermittent error decider function in first pass since it was designed to retry tasks against vsphere that are supposed to be inherently transient. Based on discussion in this PR we may change that.
I have also added some logging, with some
trace.Operation
this is pending potential removal per @cgtexmex 's review. I added it originally since it made it easier to track what all was getting called when I was learning the Exec path of operations. It was also needed since I needed to track concurrent calls against the same container.note: please keep in mind this was my first exposure to the Exec portion of the code base, if I have done something silly please let me know. I spent a decent while poking around and trying understand end to end exactly what was going on.
Fixes #6370