Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: rollback to savepoint after schema change can lead to deadlock #24885

Closed
andreimatei opened this issue Apr 17, 2018 · 30 comments · Fixed by #46170
Closed

sql: rollback to savepoint after schema change can lead to deadlock #24885

andreimatei opened this issue Apr 17, 2018 · 30 comments · Fixed by #46170
Assignees
Labels
A-schema-changes C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors

Comments

@andreimatei
Copy link
Contributor

The following deadlocks:

begin; savepoint cockroach_restart; create table t(x int); rollback to savepoint cockroach_restart;
select * from t;

The reason why it deadlocks is that the create left an intent on something (probably a table name record); the intent will never commit because txn's epoch is incremented by the rollback to savepoint, but it still exists. The select then tries to resolve that name in order to get a lease on that table - and that name resolution process uses a different txn which gets blocked on the intent.
Funny enough, if you do another create after the rollback, that works - probably because all reads/writes are done using the main txn in that case.

Found while investigating #24785

One way to fix this is for transactions that perform schema changes to employ a policy that says that they don't get to keep intents from old epochs - or rather, whenever we would bump the epoch, we'd create a new txn instead. This would mean that they don't get to benefit from the "reservation" effect of leaving intents around, but who cares. The moral justification is that these transactions touched "special" keys.
Another way is to do all lease acquisitions and name resolution and such by using the transaction running the statement that caused the lease. But I don't like that at all; it goes against divorcing the table descriptor caching layer from SQL transactions which I think is generally a great thing.

Opinions?

cc @vivekmenezes @bdarnell

@andreimatei andreimatei added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Apr 17, 2018
@andreimatei andreimatei added this to the 2.1 milestone Apr 17, 2018
@andreimatei andreimatei self-assigned this Apr 17, 2018
andreimatei added a commit to andreimatei/cockroach that referenced this issue Apr 17, 2018
…epoint

Prior to this patch, the following sequence would write a row that's
then unaccessible:
begin; savepoint cockroach_restart; create table t(x int primary key); rollback to savepoint cockroach_restart;
insert into t(x) values(1); release savepoint cockroach_restart; commit;

The insert manages to resolve the table because we weren't clearing the
"uncommitted descriptors" set that the session maintains. Since the
table descriptor doesn't end up being committed, the row will be
essentially unreachable.

This patch is original work on the 1.1 branch. This bug is not present
in the 2.0 release, where we're better about reseting state on
transaction retries.
I believe on 1.1 there's still a similar issue present when doing
automatic retries (as opposed to user-directed ones through rollback to
savepoint). I think fixing those would be more involved because it's
less clear where to stick the cleanup done in this patch; I'd rather not
do anything this point.

Also note that, with the fix, the sequence above doesn't actually work.
Instead, it deadlocks, just like it does on 2.0, because of cockroachdb#24885.
However, the following works:
begin; savepoint cockroach_restart; create table t(x int primary key); rollback to savepoint cockroach_restart;
create table t(x int primary key); insert into t(x) values(1); release savepoint cockroach_restart; commit;

Fixes cockroachdb#24785
@vivekmenezes
Copy link
Contributor

A similar deadlock happens when you create a table in an open transaction and run some commands that require that schema in other transactions.

@andreimatei
Copy link
Contributor Author

Note to self: #24888 added a test that (when ported to 2.0) can be simplified if we fix this issue.

@andreimatei
Copy link
Contributor Author

@vivekmenezes that's not a deadlock, is it? That's normal reader blocks for writer behavior.

@vivekmenezes
Copy link
Contributor

@andreimatei yeah you're right that's just requests blocking on an uncommitted intent. Please add the test in #24888 to 2.0 so we don't regress on it. Thanks!

craig bot pushed a commit that referenced this issue Apr 17, 2018
24888: sql: Fix "dangling rows" left over by schema change + rollback to savepoint r=andreimatei a=andreimatei

Prior to this patch, the following sequence would write a row that's
then unaccessible:
begin; savepoint cockroach_restart; create table t(x int primary key); rollback to savepoint cockroach_restart;
insert into t(x) values(1); release savepoint cockroach_restart; commit;

The insert manages to resolve the table because we weren't clearing the
"uncommitted descriptors" set that the session maintains. Since the
table descriptor doesn't end up being committed, the row will be
essentially unreachable.

This patch is original work on the 1.1 branch. This bug is not present
in the 2.0 release, where we're better about reseting state on
transaction retries.
I believe on 1.1 there's still a similar issue present when doing
automatic retries (as opposed to user-directed ones through rollback to
savepoint). I think fixing those would be more involved because it's
less clear where to stick the cleanup done in this patch; I'd rather not
do anything this point.

Also note that, with the fix, the sequence above doesn't actually work.
Instead, it deadlocks, just like it does on 2.0, because of #24885.
However, the following works:
begin; savepoint cockroach_restart; create table t(x int primary key); rollback to savepoint cockroach_restart;
create table t(x int primary key); insert into t(x) values(1); release savepoint cockroach_restart; commit;

Fixes #24785

cc @cockroachdb/release  @gpaul 

Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
@jordanlewis
Copy link
Member

@andreimatei this isn't 2.1, is it?

@andreimatei
Copy link
Contributor Author

It's not the worst thing in the world. In any case I'll pass it along to @vivekmenezes , thanks for the ping.
Rereading this, I now don't understand why

begin; savepoint cockroach_restart; create table t(x int); rollback to savepoint cockroach_restart;
select * from t;

deadlocks, but

begin; savepoint cockroach_restart; create table t(x int); rollback to savepoint cockroach_restart;
create;

doesn't.

@vivekmenezes
Copy link
Contributor

@andreimatei I've not studied this issue. It is my intention to move all schema change writes to commit time, which will also fix #20526

I've been holding off on that because of the worry about leases that keep renewing themselves pushing the schema change writer. But I do plan on doing that as soon as epoch based table leases is merged, because epoch based table leases is going to considerably reduce the read demands on the table descriptors

@nstewart nstewart added the S-2-temp-unavailability Temp crashes or other availability problems. Can be worked around or resolved by restarting. label Sep 18, 2018
@vivekmenezes vivekmenezes removed this from the 2.1 milestone Feb 11, 2019
@vivekmenezes
Copy link
Contributor

vivekmenezes commented Feb 11, 2019

moving this to the backlog. This is still an extant issue

@vivekmenezes vivekmenezes removed the S-2-temp-unavailability Temp crashes or other availability problems. Can be worked around or resolved by restarting. label Feb 11, 2019
@jordanlewis
Copy link
Member

Hey @andreimatei, would it be possible to make this happen as part of your SAVEPOINT work this release? I talked with @ajwerner, who suggests that the right way to move forward is to have DDL-containing transactions who get their epoch bumped use a new transaction (which functionally drops all of their outstanding intents) in the conn executor state machine.

@andreimatei
Copy link
Contributor Author

I talked with @ajwerner, who suggests that the right way to move forward is to have DDL-containing transactions who get their epoch bumped use a new transaction (which functionally drops all of their outstanding intents) in the conn executor state machine.

That's what the original message in this thread suggests too. But how is it related to savepoints? Nice try, but I'd rather let someone else do it. I can review the PR, though.

@jordanlewis
Copy link
Member

I thought it was related enough because you are already working heavily in the conn executor state machine - I figured it wouldn't be too much extra trouble to add this too. But maybe I figured wrong.

@andreimatei
Copy link
Contributor Author

I played around a bit. There's a couple of weird things going on.
The first, which I find most troubleing, is that we somehow seem to be leaving some lock behind even after we close the connection whose transaction I believe had that lock.
Repro:
client 1:

begin; savepoint a; create table t(); rollback to a;
select * from t; -> blocks

client 2:
select * from t; -> blocks

close client 1. Client 2 is still blocked! Any new client will also block. Only restarting the server seems to fix it. Closing the connection is supposed to rollback the respective txn, and it generally does.
@nvanbenschoten , this couldn't be that lock leak bug you found last night, could it?

The 2nd interesting thing is that depending on exactly how I issue the operations (or their timing?), I don't always get the deadlock.
For example, begin; savepoint a; create table t(); rollback to a; select * from t; deadlocks, but issuing every statement separately doesn't - the final select says "relation not found".

When the thing deadlocks, there's a TableCollection.getTableVersion() which is deadlocked. The stack looks like this

When the final select doesn't deadlock, its trace looks like this:

sql/resolver.go:300  [n1,client=127.0.0.1:47540,user=root] !!! planner.LookupObject: defaultdb.public.t
sql/logical_schema_accessors.go:82  [n1,client=127.0.0.1:47540,user=root] !!! LogicalSchemaAccessor: defaultdb.public.t
sql/physical_schema_accessors.go:310  [n1,client=127.0.0.1:47540,user=root] !!! CachedPhysicalAccessor: defaultdb.public.t
sql/table.go:285  [n1,client=127.0.0.1:47540,user=root] !!! TableCollection.getTableVersion: defaultdb.public.t
sql/physical_schema_accessors.go:176  [n1,client=127.0.0.1:47540,user=root] !!! UncachedPhysicalAccessor: defaultdb.public.t
sql/physical_schema_accessors.go:208  [n1,client=127.0.0.1:47540,user=root] !!! UncachedPhysical couldn't find name: defaultdb.public.t
sql/table.go:287  [n1,client=127.0.0.1:47540,user=root] !!! TableCollection.getTableVersion: defaultdb.public.t - done
sql/resolver.go:143  [n1,client=127.0.0.1:47540,user=root] !!! resolveExistingObjectImpl couldn't find table

I'll understand what's going on.

@nvanbenschoten
Copy link
Member

@nvanbenschoten , this couldn't be that lock leak bug you found last night, could it?

I don't see any evidence that points to that. The lock issue was related to lease transfers, which don't seem to be at play here. Also, I tested this with the fix and could still reproduce. Doesn't it seem more likely that whatever is deadlocking the single transaction on itself is also blocking the other one?

@andreimatei
Copy link
Contributor Author

Doesn't it seem more likely that whatever is deadlocking the single transaction on itself is also blocking the other one?

Yes, but my concern is with the rollback that's supposed to happen on connection close. The canceling of the connection is supposed to get out of the deadlock by canceling various contexts, and then everything is supposed to be rolled back - and so other clients are supposed to be unblocked.
I think I've figured this part out, though. I think the crux of the matter is that, if the connection closing happens when the SQL txn is in the Aborted state (in this particular case, the canceling of some context causes a lease acquisition to fail, which moves the txn in the Aborted state), then we no longer properly rollback everything. We're supposed to be doing that by sending a nonRetriableError{IsCommit:True} event, which was supposed to cause a rollback from any state (by virtue of the IsCommit attribute), but we're not doing the right thing in the Aborted state - we're not handling the IsCommit part correctly.

@andreimatei
Copy link
Contributor Author

Fixing the connection close bug in #45835

@andreimatei
Copy link
Contributor Author

Back to why we sometimes deadlock and sometimes don't, the difference seems to come down to LeaseManager.resolveName() blocking or not blocking on an intent (presumably on system.namespace). I don't understand why it doesn't always block yet.

@andreimatei
Copy link
Contributor Author

When we don't deadlock because reading system.namespace doesn't block, the final select tries to resolve the table name twice (using LookupObjectID). The first time it does it through LeaseManager.resolveName(), which uses its own txn. I'm not sure yet why that doesn't block.
The 2nd time it attempts to resolve the name through UncachedPhysicalAccessor. That call is using the select's txn, and I wonder if that's all right. The call is here:

return readTableFromStore()

@otan , could you please tell me what the deal is around here? I thought that the way things work is that once a txn has created some tables, those tables are resolved locally by the tableCollection without needing to go to the store. Is that not the case?

@otan
Copy link
Contributor

otan commented Mar 8, 2020

Not quite sure, I haven't done too much with table collections. I can try uncover it tomorrow.

@otan
Copy link
Contributor

otan commented Mar 9, 2020

I'm trying to understand

cockroach/pkg/sql/lease.go

Lines 1476 to 1485 in f5fe582

// Known limitation: AcquireByName() calls Acquire() and therefore suffers
// from the same limitation as Acquire (See Acquire). AcquireByName() is
// unable to function correctly on a timestamp less than the timestamp
// of a transaction with a DROP/TRUNCATE on a table. The limitation in
// the face of a DROP follows directly from the limitation on Acquire().
// A TRUNCATE is implemented by changing the name -> id mapping for a table
// and by dropping the descriptor with the old id. While AcquireByName
// can use the timestamp and get the correct name->id mapping at a
// timestamp, it uses Acquire() to get a descriptor with the corresponding
// id and fails because the id has been dropped by the TRUNCATE.

which seems to be the reason we fall back to

cockroach/pkg/sql/table.go

Lines 376 to 381 in 1007293

// Read the descriptor from the store in the face of some specific errors
// because of a known limitation of AcquireByName. See the known
// limitations of AcquireByName for details.
if _, ok := err.(inactiveTableError); ok || err == sqlbase.ErrDescriptorNotFound {
return readTableFromStore()
}

it seems we have to fallback to uncached accessor (with no txn context) as the p.Tables() makes it look like it's disappeared, but we want to see if it's actually gone. in that case, it does appear as if using the same txn for that is broken. i'm not sure if not using the txn is the behaviour we always want, though - i.e. does it break anything if we use a fresh transaction for this?

@andreimatei
Copy link
Contributor Author

Oliver, don't worry about it if you've got nothing to do with this. I thought you had introduced that fallback since I saw your name on the blame. I'll bring it to the attention of @ajwerner when I get back to the office; he seems to want to own this zoo. The fallback is only peripheral to this issue anyway.

@andreimatei
Copy link
Contributor Author

I've figured out why the sequence doesn't always deadlock. If the writes to the namespace table happen to run into the tscache, then the respective intents will be written at a pushed timestamp.
Resolving names, and acquiring leases and all that, however, always uses the triggering txn's read timestamps for its reads so, when there's no uncertainty into play, it'll ignore the intent.
When running things by hand it's easy to run into the ts cache due to the 3s closed timestamp target.

I've implemented the fix for the deadlock that we were talking about: having the connExecutor transparently switch the KV txn to a new one in certain cases. It's not much code, but it's kinda unfortunate and there's no great place to put it. It makes me a bit unhappy. I'm now thinking that perhaps it'd be better to deal with this problem in KV and have a feature there that says that locks on a particular span of keys must be released when restarting or rolling back to savepoints. I want to discuss it with my peeps.

andreimatei added a commit to andreimatei/cockroach that referenced this issue Mar 16, 2020
This patch makes reading the system.namespace table and reading table
descriptors in high-priority transactions. The transactions will push
locks owned by schema changes out of their way. The idea is that regular
SQL transactions reading schema don't want to wait for the transactions
performing DDL statements. Instead, those DDLs will be pushed and forced
to refresh.

Besides the benefit to regular transactions, this patch also prevents
deadlocks for the transactions performing DDL. Before this patch, the
final select in the following sequence would deadlock:

begin; savepoint s; create table t(x int); rollback to savepoint s;
select * from t;

The select is reading the namespace table, using a different txn from
its own. That read would block on the intent laid down by the prior
create. With this patch, the transaction effectively pushes itself, but
gets to otherwise run.

Fixes cockroachdb#24885

Release justification: Fix for an old bug that became more preminent
when we introduced SAVEPOINTs recently.

Release note: A rare bug causing transactions that have performed schema
changes to deadlock after they restart has been fixed.
craig bot pushed a commit that referenced this issue Mar 17, 2020
46170: sql: resolve schema in high-priority transactions r=andreimatei a=andreimatei

This patch makes reading the system.namespace table and reading table
descriptors in high-priority transactions. The transactions will push
locks owned by schema changes out of their way. The idea is that regular
SQL transactions reading schema don't want to wait for the transactions
performing DDL statements. Instead, those DDLs will be pushed and forced
to refresh.

Besides the benefit to regular transactions, this patch also prevents
deadlocks for the transactions performing DDL. Before this patch, the
final select in the following sequence would deadlock:

begin; savepoint s; create table t(x int); rollback to savepoint s;
select * from t;

The select is reading the namespace table, using a different txn from
its own. That read would block on the intent laid down by the prior
create. With this patch, the transaction effectively pushes itself, but
gets to otherwise run.

Fixes #24885

Release justification: Fix for an old bug that became more preminent
when we introduced SAVEPOINTs recently.

Release note: A rare bug causing transactions that have performed schema
changes to deadlock after they restart has been fixed.

Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
@craig craig bot closed this as completed in 8654b02 Mar 17, 2020
@otan
Copy link
Contributor

otan commented Mar 17, 2020

interesting, i can still repro #44385 pretty easily with this change in -- now i don't even have to run the "SHOW TABLES" query to get the deadlock.

repro:

# screen 1
./cockroach start-single-node --listen-addr=localhost:26257 --insecure
# screen 2
./cockroach sql --watch '1ms' -e 'DROP DATABASE IF EXISTS db0;CREATE DATABASE db0;USE db0;CREATE TABLE t0(c0 INT);' --insecure

edit: nvm they just take a minute now:


CREATE TABLE

Time: 46.067ms

CREATE TABLE

Time: 68.928ms

CREATE TABLE

Time: 43.123171s

CREATE TABLE

Time: 1m0.149811s

CREATE TABLE

Time: 1m0.026241s

CREATE TABLE

Time: 47.636ms

CREATE TABLE

Time: 47.969ms

CREATE TABLE

Time: 59.962868s

@andreimatei
Copy link
Contributor Author

I don't know what's up with with the delay. Mind filing it separately for Lucy or Werner?

ajwerner pushed a commit to ajwerner/cockroach that referenced this issue Apr 2, 2020
This patch makes reading the system.namespace table and reading table
descriptors in high-priority transactions. The transactions will push
locks owned by schema changes out of their way. The idea is that regular
SQL transactions reading schema don't want to wait for the transactions
performing DDL statements. Instead, those DDLs will be pushed and forced
to refresh.

Besides the benefit to regular transactions, this patch also prevents
deadlocks for the transactions performing DDL. Before this patch, the
final select in the following sequence would deadlock:

begin; savepoint s; create table t(x int); rollback to savepoint s;
select * from t;

The select is reading the namespace table, using a different txn from
its own. That read would block on the intent laid down by the prior
create. With this patch, the transaction effectively pushes itself, but
gets to otherwise run.

Fixes cockroachdb#24885

Release justification: Fix for an old bug that became more preminent
when we introduced SAVEPOINTs recently.

Release note: A rare bug causing transactions that have performed schema
changes to deadlock after they restart has been fixed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-schema-changes C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants