Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: remove re-evaluations of requests after a snapshot is applied #33381

Merged

Commits on Jan 13, 2019

  1. storage: remove re-evaluations of requests after a snapshot is applied

    This patch affects the handling of the following scenario:
    - A lease-holder had proposed a command and then received a snapshot
    with a LeaseAppliedIndex above the command's index.
    - What can the leaseholder do? It doesn't know whether the command
    applied or not - the snapshot is too raw for such a determination. So,
    the command's status is ambiguous. Note that the command can't be
    reproposed - the proposal would simply be rejected by the lease index
    check.
    - Before this patch, in a desperate attempt to avoid returning an
    ambiguous error to the client, the leaseholder was re-evaluating the
    request and declared success if the re-evaluation succeeded. It was
    remembering to declare AmbiguousResultError if the re-evaluation did not
    succeed for any reason.
    
    This patch removes the re-evaluation. We'll now return an
    AmbiguousResultError directly. This is motivated by the fact that
    testing the cases proved quite difficult, and the code involved is
    conceptually complicated (remembering the ambiguity and overwriting
    errors) and the benefit of the mechanism doesn't seem worth it:
    snapshots can only be applied when the leaseholder and the Raft leader
    are divorced - which should be rare.
    Perhaps when we get more idempotent requests we can re-introduce the
    re-evaluation in a simpler way.
    
    Fixes cockroachdb#28985
    
    Release note: None
    andreimatei committed Jan 13, 2019
    Configuration menu
    Copy the full SHA
    2e7dac6 View commit details
    Browse the repository at this point in the history
  2. storage: refactor ambiguous errors

    This patch refactors the way ambiguous errors are returned from the
    evaluation of requests. Before this patch, the situation was confusing
    and buggy: the responsibility for generating these errors was split
    between Replica.executeWriteBatch() and the lower lever
    Replica.tryExecuteWriteBatch(). The latter was returning both errors
    and so called "retry reasons", except that some retry reasons were not
    actually resulting in retries. The reasons in question were returned
    together with an error, which was in blatant violation of proposalResult
    comment that said that exactly one of these guys can be set.
    I believe the fact that both an error and a retry reason was set was
    also causing the timestamp cache updating code to be buggy, I think:
    https://github.com/cockroachdb/cockroach/blob/a5eddb23cbb7db5cbb9655b8ed7a1875779b3c62/pkg/storage/replica.go#L2214
    The code says that we update the ts cache if the request is not to be
    retried, but it was missing cases.
    
    The actual ambiguous errors were sometimes generated in
    tryExecuteWriteBatch (for reason proposalRangeNoLongerExists) and
    sometimes in executeWriteBatch (for reason proposalErrorReproposing),
    which was confusing.
    
    This patch does away with the troublesome retry reasons and removes any
    responsibility around creating ambiguous errors from executeWriteBatch.
    The lower layers know whether the conditions they're reporting should
    return in ambiguous errors or not, and so they create ambiguous errors
    directly.
    
    This patch changes the errors returned for some non-commit requests when
    the reason is proposalErrorReproposing: it used to be a non-ambiguous
    error and now it will be an ambiguous one. executeWriteBatch used to
    discriminate between commit and non-commit batches, but that was
    inconsistent (it didn't take effect for proposalRangeNoLongerExists)
    and, more importantly, it was the wrong layer to take commits into
    consideration (in my opinion).
    
    Release note: None
    andreimatei committed Jan 13, 2019
    Configuration menu
    Copy the full SHA
    80eea9e View commit details
    Browse the repository at this point in the history