Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix errcheck in service/history/shard #3755

Merged
merged 1 commit into from
Dec 28, 2022
Merged

Conversation

MichaelSnowden
Copy link
Contributor

What changed?

Why?

How did you test it?

Potential risks

Is hotfix candidate?

@MichaelSnowden MichaelSnowden requested a review from a team as a code owner December 22, 2022 01:29
@MichaelSnowden MichaelSnowden merged commit c912454 into master Dec 28, 2022
@MichaelSnowden MichaelSnowden deleted the service/history/shard branch December 28, 2022 17:33
@@ -1505,18 +1503,24 @@ func (s *ContextImpl) createEngine() Engine {

// start should only be called by the controller.
func (s *ContextImpl) start() {
s.transition(contextRequestAcquire{})
if err := s.transition(contextRequestAcquire{}); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s.transition() already emits logs if the transition is invalid I believe.

@@ -1436,8 +1437,7 @@ func (s *ContextImpl) handleReadError(err error) error {
case *persistence.ShardOwnershipLostError:
// Shard is stolen, trigger shutdown of history engine.
// Handling of max read level doesn't matter here.
s.transition(contextRequestStop{})
return err
return multierr.Combine(err, s.transition(contextRequestStop{}))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can/should combine the errors here (and two more places below).

  • Many places in our code path is still checking error type directly not via errors.As, so those places might break.
  • To me, the error from s.transition is internal to the shard context impl and upper layer should not know.

cc @dnr Would you mind also take a look?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this PR doesn't make much sense. I think you should revert these changes.

The error returned from transition is really only meaningful for contextRequestAcquired, the others will always return nil and the result doesn't have to be checked. I know errcheck isn't smart enough to figure that out but we can just manually ignore them.

As Yichao said, transition already logs so callers should not.

And I agree the multierr stuff is not appropriate here.

@@ -1436,8 +1437,7 @@ func (s *ContextImpl) handleReadError(err error) error {
case *persistence.ShardOwnershipLostError:
// Shard is stolen, trigger shutdown of history engine.
// Handling of max read level doesn't matter here.
s.transition(contextRequestStop{})
return err
return multierr.Combine(err, s.transition(contextRequestStop{}))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this PR doesn't make much sense. I think you should revert these changes.

The error returned from transition is really only meaningful for contextRequestAcquired, the others will always return nil and the result doesn't have to be checked. I know errcheck isn't smart enough to figure that out but we can just manually ignore them.

As Yichao said, transition already logs so callers should not.

And I agree the multierr stuff is not appropriate here.

@@ -790,7 +790,9 @@ func (s *controllerSuite) TestShardControllerFuzz() {
shardID := int32(rand.Intn(int(s.config.NumberOfShards))) + 1
switch rand.Intn(5) {
case 0:
s.shardController.GetShardByID(shardID)
if _, err := s.shardController.GetShardByID(shardID); err != nil {
return err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returning defeats the purpose of this code, which is to generate load on the shard controller. this shouldn't return an error but if it does the worker should continue to run.

MichaelSnowden added a commit that referenced this pull request Jan 25, 2023
@MichaelSnowden
Copy link
Contributor Author

Thanks for spotting these issues. I have a PR to revert these changes here: #3842

MichaelSnowden added a commit that referenced this pull request Jan 26, 2023
MichaelSnowden added a commit that referenced this pull request Jan 31, 2023
MichaelSnowden added a commit that referenced this pull request Feb 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants