stability: keep a cache of pushed transactions #627

petermattis · 2015-04-10T00:22:18Z

Keep a cache of pushed transactions on a store to avoid repushing further intents after a txn has already been aborted or its timestamp moved forward.

yananzhi · 2015-04-29T11:47:24Z

@petermattis Could you assign this to me? I Would like to do some contribute.

petermattis · 2015-04-29T13:08:25Z

@yananzhi Github is not allowing me to assign an issue to someone who is not in the cockroachdb org. Regardless of assignment, feel free to work on this. No one else is and your comment on this issue will be enough to indicate that you are doing so.

yananzhi · 2015-04-29T14:03:48Z

@petermattis thanks

tbg · 2015-10-23T04:25:43Z

closing this for now. We have a related idea in #2632 and all of this is post-1.0 anyway.

tbg · 2016-06-28T15:10:01Z

Now that we have SQL and dropping tables issues DeleteRange on large swaths of keyspace, this is becoming relevant again. In the above scenario, the following could happen:

Transactional DeleteRange hits a whole range and creates (say) 10k intents
Later, it has to abort - the intents remain
The new incarnation of the transaction tries to perform the same DeleteRange again.
On the first key, it sees the intent, pushes the transaction (trivially successful since that txn is aborted), resolves the intent, performs the deletion (again).
On the second key, <everything from above>
...

That turns a single DeleteRange MVCC operation into 10k RPCs, which clearly has no chance of going well.

It's time to augment the intentResolver so that after a successful push, it remembers the pushee (or rather, its TxnID and Epoch are mapped to the resulting TxnMeta). A subsequent push can then synthesize a corresponding []roachpb.Intent.
[nb: The geek in me wants to use a Cuckoo Filter for this task, but probably LRU is the way to go.]

This itself isn't enough: it saves the 10k pushes, but we will still have to resolve 10k intents and have the DeleteRange bounce back and forth 10k times, which is guaranteed to be a major drag on Raft.

An elegant solution comes up with leader-proposed Raft: since we see the intents before proposing to Raft, we can have more complex logic which performs a single push for the first intent, and then "directly" overwrites the remaining "identical" intents it encounters for the remainder of the DeleteRange. If we took that to the extreme, it could mean supplying more logic down to the lower MVCC levels (about which intents cause write intent errors), but I think we should hold off on that unless wide deletions are a regular thing that needs to be very fast.

Even before leader-proposed Raft, we could in principle do the same thing: once a DeleteRange runs into an intent, we execute it before Raft (without committing the result, just to discover more intents) and supply a batch-resolution to Raft. But that's fairly complex when you take into account that it needs to go away anyway.

My hunch is that this #7499 should be addressed first since there is a fairly clear path there to avoid these deletions in the first place (though users may still perform table deletions which have much of the same problems).

spencerkimball · 2017-04-03T23:29:25Z

Closing as this is a solution hatched before we've observed the problem. I for one am hoping that instead of range-wide DeleteRange calls, we smartly move dropped tables into a trash system database and then when dropping from the system database, we use ClearRange. The larger point is I want to close these conceptual solution issues that aren't anchored to a pressing concern based on real-world customer needs / issues.

bdarnell mentioned this issue May 18, 2015

Keep a cache of pushed transactions on the store #1041

Closed

tamird assigned yananzhi Jul 22, 2015

tamird added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jul 22, 2015

tbg mentioned this issue Sep 23, 2015

storage: judge transactions interact before multiraft replication #2632

Closed

tbg closed this as completed Oct 23, 2015

tbg reopened this Jun 28, 2016

tbg mentioned this issue Jun 28, 2016

storage: measure/optimize intent resolution #7503

Closed

petermattis changed the title ~~Keep a cache of pushed transactions~~ storage: keep a cache of pushed transactions Jun 29, 2016

andreimatei mentioned this issue Jun 29, 2016

sql: drop table and recreate fail #7348

Closed

tamird unassigned yananzhi Jul 1, 2016

tbg mentioned this issue Aug 22, 2016

stability: Coalesce duplicate (sync) write intent errors #8692

Closed

tbg changed the title ~~storage: keep a cache of pushed transactions~~ stability: keep a cache of pushed transactions Sep 26, 2016

tbg added S-1-stability Severe stability issues that can be fixed by upgrading, but usually don’t resolve by restarting and removed C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) labels Sep 26, 2016

tbg mentioned this issue Sep 26, 2016

stability: run insert&delete-heavy workload with intent/GC queue pressure #9540

Closed

tbg mentioned this issue Oct 6, 2016

running drop table command ... system hung ... don't know what happened. #9776

Closed

petermattis added this to the Later milestone Feb 22, 2017

spencerkimball closed this as completed Apr 3, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stability: keep a cache of pushed transactions #627

stability: keep a cache of pushed transactions #627

petermattis commented Apr 10, 2015

yananzhi commented Apr 29, 2015

petermattis commented Apr 29, 2015

yananzhi commented Apr 29, 2015

tbg commented Oct 23, 2015

tbg commented Jun 28, 2016

spencerkimball commented Apr 3, 2017

stability: keep a cache of pushed transactions #627

stability: keep a cache of pushed transactions #627

Comments

petermattis commented Apr 10, 2015

yananzhi commented Apr 29, 2015

petermattis commented Apr 29, 2015

yananzhi commented Apr 29, 2015

tbg commented Oct 23, 2015

tbg commented Jun 28, 2016

spencerkimball commented Apr 3, 2017