sql: rollback to savepoint after schema change can lead to deadlock #24885

andreimatei · 2018-04-17T20:31:09Z

The following deadlocks:

begin; savepoint cockroach_restart; create table t(x int); rollback to savepoint cockroach_restart;
select * from t;

The reason why it deadlocks is that the create left an intent on something (probably a table name record); the intent will never commit because txn's epoch is incremented by the rollback to savepoint, but it still exists. The select then tries to resolve that name in order to get a lease on that table - and that name resolution process uses a different txn which gets blocked on the intent.
Funny enough, if you do another create after the rollback, that works - probably because all reads/writes are done using the main txn in that case.

Found while investigating #24785

One way to fix this is for transactions that perform schema changes to employ a policy that says that they don't get to keep intents from old epochs - or rather, whenever we would bump the epoch, we'd create a new txn instead. This would mean that they don't get to benefit from the "reservation" effect of leaving intents around, but who cares. The moral justification is that these transactions touched "special" keys.
Another way is to do all lease acquisitions and name resolution and such by using the transaction running the statement that caused the lease. But I don't like that at all; it goes against divorcing the table descriptor caching layer from SQL transactions which I think is generally a great thing.

Opinions?

cc @vivekmenezes @bdarnell

The text was updated successfully, but these errors were encountered:

…epoint Prior to this patch, the following sequence would write a row that's then unaccessible: begin; savepoint cockroach_restart; create table t(x int primary key); rollback to savepoint cockroach_restart; insert into t(x) values(1); release savepoint cockroach_restart; commit; The insert manages to resolve the table because we weren't clearing the "uncommitted descriptors" set that the session maintains. Since the table descriptor doesn't end up being committed, the row will be essentially unreachable. This patch is original work on the 1.1 branch. This bug is not present in the 2.0 release, where we're better about reseting state on transaction retries. I believe on 1.1 there's still a similar issue present when doing automatic retries (as opposed to user-directed ones through rollback to savepoint). I think fixing those would be more involved because it's less clear where to stick the cleanup done in this patch; I'd rather not do anything this point. Also note that, with the fix, the sequence above doesn't actually work. Instead, it deadlocks, just like it does on 2.0, because of cockroachdb#24885. However, the following works: begin; savepoint cockroach_restart; create table t(x int primary key); rollback to savepoint cockroach_restart; create table t(x int primary key); insert into t(x) values(1); release savepoint cockroach_restart; commit; Fixes cockroachdb#24785

vivekmenezes · 2018-04-17T20:48:33Z

A similar deadlock happens when you create a table in an open transaction and run some commands that require that schema in other transactions.

andreimatei · 2018-04-17T20:49:02Z

Note to self: #24888 added a test that (when ported to 2.0) can be simplified if we fix this issue.

andreimatei · 2018-04-17T20:49:50Z

@vivekmenezes that's not a deadlock, is it? That's normal reader blocks for writer behavior.

vivekmenezes · 2018-04-17T20:58:42Z

@andreimatei yeah you're right that's just requests blocking on an uncommitted intent. Please add the test in #24888 to 2.0 so we don't regress on it. Thanks!

@gpaul

24888: sql: Fix "dangling rows" left over by schema change + rollback to savepoint r=andreimatei a=andreimatei Prior to this patch, the following sequence would write a row that's then unaccessible: begin; savepoint cockroach_restart; create table t(x int primary key); rollback to savepoint cockroach_restart; insert into t(x) values(1); release savepoint cockroach_restart; commit; The insert manages to resolve the table because we weren't clearing the "uncommitted descriptors" set that the session maintains. Since the table descriptor doesn't end up being committed, the row will be essentially unreachable. This patch is original work on the 1.1 branch. This bug is not present in the 2.0 release, where we're better about reseting state on transaction retries. I believe on 1.1 there's still a similar issue present when doing automatic retries (as opposed to user-directed ones through rollback to savepoint). I think fixing those would be more involved because it's less clear where to stick the cleanup done in this patch; I'd rather not do anything this point. Also note that, with the fix, the sequence above doesn't actually work. Instead, it deadlocks, just like it does on 2.0, because of #24885. However, the following works: begin; savepoint cockroach_restart; create table t(x int primary key); rollback to savepoint cockroach_restart; create table t(x int primary key); insert into t(x) values(1); release savepoint cockroach_restart; commit; Fixes #24785 cc @cockroachdb/release @gpaul Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>

jordanlewis · 2018-08-21T15:10:18Z

@andreimatei this isn't 2.1, is it?

andreimatei · 2018-08-21T16:01:33Z

It's not the worst thing in the world. In any case I'll pass it along to @vivekmenezes , thanks for the ping.
Rereading this, I now don't understand why

begin; savepoint cockroach_restart; create table t(x int); rollback to savepoint cockroach_restart;
select * from t;

deadlocks, but

begin; savepoint cockroach_restart; create table t(x int); rollback to savepoint cockroach_restart;
create;

doesn't.

vivekmenezes · 2018-08-21T16:32:28Z

@andreimatei I've not studied this issue. It is my intention to move all schema change writes to commit time, which will also fix #20526

I've been holding off on that because of the worry about leases that keep renewing themselves pushing the schema change writer. But I do plan on doing that as soon as epoch based table leases is merged, because epoch based table leases is going to considerably reduce the read demands on the table descriptors

vivekmenezes · 2019-02-11T17:47:04Z

moving this to the backlog. This is still an extant issue

jordanlewis · 2020-03-03T21:31:19Z

Hey @andreimatei, would it be possible to make this happen as part of your SAVEPOINT work this release? I talked with @ajwerner, who suggests that the right way to move forward is to have DDL-containing transactions who get their epoch bumped use a new transaction (which functionally drops all of their outstanding intents) in the conn executor state machine.

andreimatei · 2020-03-03T22:00:56Z

I talked with @ajwerner, who suggests that the right way to move forward is to have DDL-containing transactions who get their epoch bumped use a new transaction (which functionally drops all of their outstanding intents) in the conn executor state machine.

That's what the original message in this thread suggests too. But how is it related to savepoints? Nice try, but I'd rather let someone else do it. I can review the PR, though.

jordanlewis · 2020-03-04T04:24:51Z

I thought it was related enough because you are already working heavily in the conn executor state machine - I figured it wouldn't be too much extra trouble to add this too. But maybe I figured wrong.

andreimatei · 2020-03-06T19:05:31Z

I played around a bit. There's a couple of weird things going on.
The first, which I find most troubleing, is that we somehow seem to be leaving some lock behind even after we close the connection whose transaction I believe had that lock.
Repro:
client 1:

begin; savepoint a; create table t(); rollback to a;
select * from t; -> blocks

client 2:
select * from t; -> blocks

close client 1. Client 2 is still blocked! Any new client will also block. Only restarting the server seems to fix it. Closing the connection is supposed to rollback the respective txn, and it generally does.
@nvanbenschoten , this couldn't be that lock leak bug you found last night, could it?

The 2nd interesting thing is that depending on exactly how I issue the operations (or their timing?), I don't always get the deadlock.
For example, begin; savepoint a; create table t(); rollback to a; select * from t; deadlocks, but issuing every statement separately doesn't - the final select says "relation not found".

When the thing deadlocks, there's a TableCollection.getTableVersion() which is deadlocked. The stack looks like this

When the final select doesn't deadlock, its trace looks like this:

sql/resolver.go:300  [n1,client=127.0.0.1:47540,user=root] !!! planner.LookupObject: defaultdb.public.t
sql/logical_schema_accessors.go:82  [n1,client=127.0.0.1:47540,user=root] !!! LogicalSchemaAccessor: defaultdb.public.t
sql/physical_schema_accessors.go:310  [n1,client=127.0.0.1:47540,user=root] !!! CachedPhysicalAccessor: defaultdb.public.t
sql/table.go:285  [n1,client=127.0.0.1:47540,user=root] !!! TableCollection.getTableVersion: defaultdb.public.t
sql/physical_schema_accessors.go:176  [n1,client=127.0.0.1:47540,user=root] !!! UncachedPhysicalAccessor: defaultdb.public.t
sql/physical_schema_accessors.go:208  [n1,client=127.0.0.1:47540,user=root] !!! UncachedPhysical couldn't find name: defaultdb.public.t
sql/table.go:287  [n1,client=127.0.0.1:47540,user=root] !!! TableCollection.getTableVersion: defaultdb.public.t - done
sql/resolver.go:143  [n1,client=127.0.0.1:47540,user=root] !!! resolveExistingObjectImpl couldn't find table

I'll understand what's going on.

nvanbenschoten · 2020-03-07T05:53:00Z

@nvanbenschoten , this couldn't be that lock leak bug you found last night, could it?

I don't see any evidence that points to that. The lock issue was related to lease transfers, which don't seem to be at play here. Also, I tested this with the fix and could still reproduce. Doesn't it seem more likely that whatever is deadlocking the single transaction on itself is also blocking the other one?

andreimatei · 2020-03-07T16:36:53Z

Doesn't it seem more likely that whatever is deadlocking the single transaction on itself is also blocking the other one?

Yes, but my concern is with the rollback that's supposed to happen on connection close. The canceling of the connection is supposed to get out of the deadlock by canceling various contexts, and then everything is supposed to be rolled back - and so other clients are supposed to be unblocked.
I think I've figured this part out, though. I think the crux of the matter is that, if the connection closing happens when the SQL txn is in the Aborted state (in this particular case, the canceling of some context causes a lease acquisition to fail, which moves the txn in the Aborted state), then we no longer properly rollback everything. We're supposed to be doing that by sending a nonRetriableError{IsCommit:True} event, which was supposed to cause a rollback from any state (by virtue of the IsCommit attribute), but we're not doing the right thing in the Aborted state - we're not handling the IsCommit part correctly.

andreimatei · 2020-03-07T21:09:26Z

Fixing the connection close bug in #45835

andreimatei · 2020-03-08T16:13:37Z

Back to why we sometimes deadlock and sometimes don't, the difference seems to come down to LeaseManager.resolveName() blocking or not blocking on an intent (presumably on system.namespace). I don't understand why it doesn't always block yet.

andreimatei · 2020-03-08T23:04:08Z

When we don't deadlock because reading system.namespace doesn't block, the final select tries to resolve the table name twice (using LookupObjectID). The first time it does it through LeaseManager.resolveName(), which uses its own txn. I'm not sure yet why that doesn't block.
The 2nd time it attempts to resolve the name through UncachedPhysicalAccessor. That call is using the select's txn, and I wonder if that's all right. The call is here:

cockroach/pkg/sql/table.go

Line 380 in 1007293

return readTableFromStore()

@otan , could you please tell me what the deal is around here? I thought that the way things work is that once a txn has created some tables, those tables are resolved locally by the tableCollection without needing to go to the store. Is that not the case?

otan · 2020-03-08T23:25:57Z

Not quite sure, I haven't done too much with table collections. I can try uncover it tomorrow.

otan · 2020-03-09T16:50:55Z

I'm trying to understand

cockroach/pkg/sql/lease.go

Lines 1476 to 1485 in f5fe582

    
           // Known limitation: AcquireByName() calls Acquire() and therefore suffers 
        
           // from the same limitation as Acquire (See Acquire). AcquireByName() is 
        
           // unable to function correctly on a timestamp less than the timestamp 
        
           // of a transaction with a DROP/TRUNCATE on a table. The limitation in 
        
           // the face of a DROP follows directly from the limitation on Acquire(). 
        
           // A TRUNCATE is implemented by changing the name -> id mapping for a table 
        
           // and by dropping the descriptor with the old id. While AcquireByName 
        
           // can use the timestamp and get the correct name->id  mapping at a 
        
           // timestamp, it uses Acquire() to get a descriptor with the corresponding 
        
           // id and fails because the id has been dropped by the TRUNCATE.

which seems to be the reason we fall back to

cockroach/pkg/sql/table.go

Lines 376 to 381 in 1007293

    
           // Read the descriptor from the store in the face of some specific errors 
        
           // because of a known limitation of AcquireByName. See the known 
        
           // limitations of AcquireByName for details. 
        
           if _, ok := err.(inactiveTableError); ok || err == sqlbase.ErrDescriptorNotFound { 
        
           	return readTableFromStore() 
        
           }

it seems we have to fallback to uncached accessor (with no txn context) as the p.Tables() makes it look like it's disappeared, but we want to see if it's actually gone. in that case, it does appear as if using the same txn for that is broken. i'm not sure if not using the txn is the behaviour we always want, though - i.e. does it break anything if we use a fresh transaction for this?

andreimatei · 2020-03-09T17:36:01Z

Oliver, don't worry about it if you've got nothing to do with this. I thought you had introduced that fallback since I saw your name on the blame. I'll bring it to the attention of @ajwerner when I get back to the office; he seems to want to own this zoo. The fallback is only peripheral to this issue anyway.

andreimatei · 2020-03-09T21:47:03Z

I've figured out why the sequence doesn't always deadlock. If the writes to the namespace table happen to run into the tscache, then the respective intents will be written at a pushed timestamp.
Resolving names, and acquiring leases and all that, however, always uses the triggering txn's read timestamps for its reads so, when there's no uncertainty into play, it'll ignore the intent.
When running things by hand it's easy to run into the ts cache due to the 3s closed timestamp target.

I've implemented the fix for the deadlock that we were talking about: having the connExecutor transparently switch the KV txn to a new one in certain cases. It's not much code, but it's kinda unfortunate and there's no great place to put it. It makes me a bit unhappy. I'm now thinking that perhaps it'd be better to deal with this problem in KV and have a feature there that says that locks on a particular span of keys must be released when restarting or rolling back to savepoints. I want to discuss it with my peeps.

This patch makes reading the system.namespace table and reading table descriptors in high-priority transactions. The transactions will push locks owned by schema changes out of their way. The idea is that regular SQL transactions reading schema don't want to wait for the transactions performing DDL statements. Instead, those DDLs will be pushed and forced to refresh. Besides the benefit to regular transactions, this patch also prevents deadlocks for the transactions performing DDL. Before this patch, the final select in the following sequence would deadlock: begin; savepoint s; create table t(x int); rollback to savepoint s; select * from t; The select is reading the namespace table, using a different txn from its own. That read would block on the intent laid down by the prior create. With this patch, the transaction effectively pushes itself, but gets to otherwise run. Fixes cockroachdb#24885 Release justification: Fix for an old bug that became more preminent when we introduced SAVEPOINTs recently. Release note: A rare bug causing transactions that have performed schema changes to deadlock after they restart has been fixed.

46170: sql: resolve schema in high-priority transactions r=andreimatei a=andreimatei This patch makes reading the system.namespace table and reading table descriptors in high-priority transactions. The transactions will push locks owned by schema changes out of their way. The idea is that regular SQL transactions reading schema don't want to wait for the transactions performing DDL statements. Instead, those DDLs will be pushed and forced to refresh. Besides the benefit to regular transactions, this patch also prevents deadlocks for the transactions performing DDL. Before this patch, the final select in the following sequence would deadlock: begin; savepoint s; create table t(x int); rollback to savepoint s; select * from t; The select is reading the namespace table, using a different txn from its own. That read would block on the intent laid down by the prior create. With this patch, the transaction effectively pushes itself, but gets to otherwise run. Fixes #24885 Release justification: Fix for an old bug that became more preminent when we introduced SAVEPOINTs recently. Release note: A rare bug causing transactions that have performed schema changes to deadlock after they restart has been fixed. Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>

otan · 2020-03-17T21:35:50Z

interesting, i can still repro #44385 pretty easily with this change in -- now i don't even have to run the "SHOW TABLES" query to get the deadlock.

repro:

# screen 1
./cockroach start-single-node --listen-addr=localhost:26257 --insecure
# screen 2
./cockroach sql --watch '1ms' -e 'DROP DATABASE IF EXISTS db0;CREATE DATABASE db0;USE db0;CREATE TABLE t0(c0 INT);' --insecure

edit: nvm they just take a minute now:


CREATE TABLE

Time: 46.067ms

CREATE TABLE

Time: 68.928ms

CREATE TABLE

Time: 43.123171s

CREATE TABLE

Time: 1m0.149811s

CREATE TABLE

Time: 1m0.026241s

CREATE TABLE

Time: 47.636ms

CREATE TABLE

Time: 47.969ms

CREATE TABLE

Time: 59.962868s

andreimatei · 2020-03-17T22:00:11Z

I don't know what's up with with the delay. Mind filing it separately for Lucy or Werner?

This patch makes reading the system.namespace table and reading table descriptors in high-priority transactions. The transactions will push locks owned by schema changes out of their way. The idea is that regular SQL transactions reading schema don't want to wait for the transactions performing DDL statements. Instead, those DDLs will be pushed and forced to refresh. Besides the benefit to regular transactions, this patch also prevents deadlocks for the transactions performing DDL. Before this patch, the final select in the following sequence would deadlock: begin; savepoint s; create table t(x int); rollback to savepoint s; select * from t; The select is reading the namespace table, using a different txn from its own. That read would block on the intent laid down by the prior create. With this patch, the transaction effectively pushes itself, but gets to otherwise run. Fixes cockroachdb#24885 Release justification: Fix for an old bug that became more preminent when we introduced SAVEPOINTs recently. Release note: A rare bug causing transactions that have performed schema changes to deadlock after they restart has been fixed.

andreimatei added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Apr 17, 2018

andreimatei added this to the 2.1 milestone Apr 17, 2018

andreimatei self-assigned this Apr 17, 2018

andreimatei mentioned this issue Apr 17, 2018

sql: Fix "dangling rows" left over by schema change + rollback to savepoint #24888

Merged

This was referenced Apr 18, 2018

backport-2.0: sql: stop using txn.OrigTimestamp to reset TableCollection #24874

Merged

sql: DDL statements are non-transactional #24785

Closed

jordanlewis added the A-schema-changes label Aug 21, 2018

andreimatei assigned vivekmenezes and unassigned andreimatei Aug 21, 2018

nstewart added the S-2-temp-unavailability Temp crashes or other availability problems. Can be worked around or resolved by restarting. label Sep 18, 2018

vivekmenezes removed this from the 2.1 milestone Feb 11, 2019

vivekmenezes removed the S-2-temp-unavailability Temp crashes or other availability problems. Can be worked around or resolved by restarting. label Feb 11, 2019

jordanlewis assigned andreimatei Mar 3, 2020

knz mentioned this issue Mar 6, 2020

sql: rollback past DDL causes deadlock #45793

Closed

jordanlewis mentioned this issue Mar 16, 2020

20.1 release blockers list #45599

Closed

83 tasks

andreimatei mentioned this issue Mar 16, 2020

sql: resolve schema in high-priority transactions #46170

Merged

craig bot closed this as completed in 8654b02 Mar 17, 2020

otan mentioned this issue Mar 17, 2020

sql: back-to-back schema DROP DATABASE and CREATE DATABASE take up to 60s #46224

Closed

ajwerner mentioned this issue Sep 21, 2020

kv: child transactions #54633

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: rollback to savepoint after schema change can lead to deadlock #24885

sql: rollback to savepoint after schema change can lead to deadlock #24885

andreimatei commented Apr 17, 2018

vivekmenezes commented Apr 17, 2018

andreimatei commented Apr 17, 2018

andreimatei commented Apr 17, 2018

vivekmenezes commented Apr 17, 2018

jordanlewis commented Aug 21, 2018

andreimatei commented Aug 21, 2018

vivekmenezes commented Aug 21, 2018

vivekmenezes commented Feb 11, 2019 •

edited

Loading

jordanlewis commented Mar 3, 2020

andreimatei commented Mar 3, 2020

jordanlewis commented Mar 4, 2020

andreimatei commented Mar 6, 2020

nvanbenschoten commented Mar 7, 2020

andreimatei commented Mar 7, 2020

andreimatei commented Mar 7, 2020

andreimatei commented Mar 8, 2020

andreimatei commented Mar 8, 2020

otan commented Mar 8, 2020

otan commented Mar 9, 2020

andreimatei commented Mar 9, 2020

andreimatei commented Mar 9, 2020

otan commented Mar 17, 2020 •

edited

Loading

andreimatei commented Mar 17, 2020

sql: rollback to savepoint after schema change can lead to deadlock #24885

sql: rollback to savepoint after schema change can lead to deadlock #24885

Comments

andreimatei commented Apr 17, 2018

vivekmenezes commented Apr 17, 2018

andreimatei commented Apr 17, 2018

andreimatei commented Apr 17, 2018

vivekmenezes commented Apr 17, 2018

jordanlewis commented Aug 21, 2018

andreimatei commented Aug 21, 2018

vivekmenezes commented Aug 21, 2018

vivekmenezes commented Feb 11, 2019 • edited Loading

jordanlewis commented Mar 3, 2020

andreimatei commented Mar 3, 2020

jordanlewis commented Mar 4, 2020

andreimatei commented Mar 6, 2020

nvanbenschoten commented Mar 7, 2020

andreimatei commented Mar 7, 2020

andreimatei commented Mar 7, 2020

andreimatei commented Mar 8, 2020

andreimatei commented Mar 8, 2020

otan commented Mar 8, 2020

otan commented Mar 9, 2020

andreimatei commented Mar 9, 2020

andreimatei commented Mar 9, 2020

otan commented Mar 17, 2020 • edited Loading

andreimatei commented Mar 17, 2020

vivekmenezes commented Feb 11, 2019 •

edited

Loading

otan commented Mar 17, 2020 •

edited

Loading