Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stability: keep a cache of pushed transactions #627

Closed
petermattis opened this issue Apr 10, 2015 · 6 comments
Closed

stability: keep a cache of pushed transactions #627

petermattis opened this issue Apr 10, 2015 · 6 comments
Labels
S-1-stability Severe stability issues that can be fixed by upgrading, but usually don’t resolve by restarting
Milestone

Comments

@petermattis
Copy link
Collaborator

Keep a cache of pushed transactions on a store to avoid repushing further intents after a txn has already been aborted or its timestamp moved forward.

@yananzhi
Copy link
Contributor

@petermattis Could you assign this to me? I Would like to do some contribute.

@petermattis
Copy link
Collaborator Author

@yananzhi Github is not allowing me to assign an issue to someone who is not in the cockroachdb org. Regardless of assignment, feel free to work on this. No one else is and your comment on this issue will be enough to indicate that you are doing so.

@yananzhi
Copy link
Contributor

@petermattis thanks

@tamird tamird added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jul 22, 2015
@tbg
Copy link
Member

tbg commented Oct 23, 2015

closing this for now. We have a related idea in #2632 and all of this is post-1.0 anyway.

@tbg tbg closed this as completed Oct 23, 2015
@tbg tbg reopened this Jun 28, 2016
@tbg
Copy link
Member

tbg commented Jun 28, 2016

Now that we have SQL and dropping tables issues DeleteRange on large swaths of keyspace, this is becoming relevant again. In the above scenario, the following could happen:

  • Transactional DeleteRange hits a whole range and creates (say) 10k intents
  • Later, it has to abort - the intents remain
  • The new incarnation of the transaction tries to perform the same DeleteRange again.
  • On the first key, it sees the intent, pushes the transaction (trivially successful since that txn is aborted), resolves the intent, performs the deletion (again).
  • On the second key, <everything from above>
  • ...

That turns a single DeleteRange MVCC operation into 10k RPCs, which clearly has no chance of going well.

It's time to augment the intentResolver so that after a successful push, it remembers the pushee (or rather, its TxnID and Epoch are mapped to the resulting TxnMeta). A subsequent push can then synthesize a corresponding []roachpb.Intent.
[nb: The geek in me wants to use a Cuckoo Filter for this task, but probably LRU is the way to go.]

This itself isn't enough: it saves the 10k pushes, but we will still have to resolve 10k intents and have the DeleteRange bounce back and forth 10k times, which is guaranteed to be a major drag on Raft.

An elegant solution comes up with leader-proposed Raft: since we see the intents before proposing to Raft, we can have more complex logic which performs a single push for the first intent, and then "directly" overwrites the remaining "identical" intents it encounters for the remainder of the DeleteRange. If we took that to the extreme, it could mean supplying more logic down to the lower MVCC levels (about which intents cause write intent errors), but I think we should hold off on that unless wide deletions are a regular thing that needs to be very fast.

Even before leader-proposed Raft, we could in principle do the same thing: once a DeleteRange runs into an intent, we execute it before Raft (without committing the result, just to discover more intents) and supply a batch-resolution to Raft. But that's fairly complex when you take into account that it needs to go away anyway.

My hunch is that this #7499 should be addressed first since there is a fairly clear path there to avoid these deletions in the first place (though users may still perform table deletions which have much of the same problems).

@petermattis petermattis changed the title Keep a cache of pushed transactions storage: keep a cache of pushed transactions Jun 29, 2016
@tbg tbg changed the title storage: keep a cache of pushed transactions stability: keep a cache of pushed transactions Sep 26, 2016
@tbg tbg added S-1-stability Severe stability issues that can be fixed by upgrading, but usually don’t resolve by restarting and removed C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) labels Sep 26, 2016
@petermattis petermattis added this to the Later milestone Feb 22, 2017
@spencerkimball
Copy link
Member

Closing as this is a solution hatched before we've observed the problem. I for one am hoping that instead of range-wide DeleteRange calls, we smartly move dropped tables into a trash system database and then when dropping from the system database, we use ClearRange. The larger point is I want to close these conceptual solution issues that aren't anchored to a pressing concern based on real-world customer needs / issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-1-stability Severe stability issues that can be fixed by upgrading, but usually don’t resolve by restarting
Projects
None yet
Development

No branches or pull requests

5 participants