Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: DDL statements are non-transactional #24785

Closed
gpaul opened this issue Apr 13, 2018 · 21 comments
Closed

sql: DDL statements are non-transactional #24785

gpaul opened this issue Apr 13, 2018 · 21 comments
Assignees
Milestone

Comments

@gpaul
Copy link

gpaul commented Apr 13, 2018

Is this a question, feature request, or bug report?

BUG REPORT

  1. Please supply the header (i.e. the first few lines) of your most recent
    log file for each node in your cluster. On most unix-based systems
    running with defaults, this boils down to the output of

    grep -F '[config]' cockroach-data/logs/cockroach.log

I180413 16:22:56.601315 1 util/log/clog.go:1041  [config] file created at: 2018/04/13 16:22:56
I180413 16:22:56.601315 1 util/log/clog.go:1041  [config] running on machine: gpaul-dell
I180413 16:22:56.601315 1 util/log/clog.go:1041  [config] binary: CockroachDB CCL v1.1.7 (linux amd64, built 2018/03/26 15:56:41, go1.8.3)
I180413 16:22:56.601315 1 util/log/clog.go:1041  [config] arguments: [./cockroach start --insecure]

  1. Please describe the issue you observed:
  • What did you do?

Start a single node cluster:

./cockroach start --insecure

Create a database called 'testdb':

./cockroach sql --insecure -e 'create database testdb'

Execute the following commands one by one (note the ROLLBACK SAVEPOINT halfway through followed by re-issuing of the transaction's contents):

BEGIN;

SAVEPOINT cockroach_restart;

CREATE TABLE groups (
id INTEGER DEFAULT unique_rowid() NOT NULL,
gid VARCHAR,
description VARCHAR,
PRIMARY KEY (id),
UNIQUE (gid)
);

CREATE TABLE resources (
id INTEGER DEFAULT unique_rowid() NOT NULL,
rid VARCHAR,
description VARCHAR,
PRIMARY KEY (id),
UNIQUE (rid)
);

CREATE TABLE users (
id INTEGER DEFAULT unique_rowid() NOT NULL,
uid VARCHAR,
passwordhash VARCHAR,
utype VARCHAR(7),
description VARCHAR,
is_remote BOOLEAN,
PRIMARY KEY (id),
UNIQUE (uid),
CONSTRAINT usertype CHECK (utype IN ('regular', 'service'))
);

CREATE TABLE configs (
id INTEGER DEFAULT unique_rowid() NOT NULL,
key VARCHAR,
value VARCHAR,
PRIMARY KEY (id),
UNIQUE (key)
);

CREATE TABLE user_groups (
user_id INTEGER NOT NULL,
group_id INTEGER NOT NULL,
PRIMARY KEY (user_id, group_id),
UNIQUE (user_id, group_id)
);

CREATE TABLE aces (
id INTEGER DEFAULT unique_rowid() NOT NULL,
user_id INTEGER,
group_id INTEGER,
resource_id INTEGER,
rid VARCHAR,
resource_description VARCHAR,
gid VARCHAR,
actions TEXT,
PRIMARY KEY (id),
CONSTRAINT user_resource_unique UNIQUE (user_id, resource_id),
CONSTRAINT group_resource_unique UNIQUE (group_id, resource_id)
);

CREATE INDEX aces_resource_id_asc ON aces (resource_id ASC);

CREATE INDEX aces_user_id_rid_actions_resource_description_asc ON aces (user_id ASC, rid ASC, actions ASC, resource_description ASC);

CREATE INDEX aces_group_id_asc_resource_id_asc_actions_asc ON aces (group_id ASC, resource_id ASC, actions ASC);

CREATE INDEX aces_group_id_rid_actions_resource_description_gid_asc ON aces (group_id ASC, rid ASC, actions ASC, resource_description ASC, gid ASC);

CREATE INDEX aces_user_id_asc_resource_id_asc_actions_asc ON aces (user_id ASC, resource_id ASC, actions ASC);

CREATE TABLE alembic_version (
version_num VARCHAR(32) NOT NULL,
CONSTRAINT alembic_version_pkc PRIMARY KEY (version_num)
);

INSERT INTO alembic_version (version_num) VALUES ('d3d471b75e78') RETURNING alembic_version.version_num;

INSERT INTO users (uid, passwordhash, utype, description, is_remote) VALUES ('bootstrapuser', '$6$rounds=656000$Ddm1dA4kl0J7gsdN$wmBHwxTBCMHxgYDZ8hSFifdOyQWVmPxgAQPUojKC1uFdR3kVXFFLmXlxIhk.vlwrJ7KqHX5qHOSwOebP9VwBK/', 'regular', 'bootstrapuser', 'false') RETURNING users.id;

ROLLBACK TO SAVEPOINT cockroach_restart;

CREATE TABLE groups (
id INTEGER DEFAULT unique_rowid() NOT NULL,
gid VARCHAR,
description VARCHAR,
PRIMARY KEY (id),
UNIQUE (gid)
);

CREATE TABLE resources (
id INTEGER DEFAULT unique_rowid() NOT NULL,
rid VARCHAR,
description VARCHAR,
PRIMARY KEY (id),
UNIQUE (rid)
);

CREATE TABLE users (
id INTEGER DEFAULT unique_rowid() NOT NULL,
uid VARCHAR,
passwordhash VARCHAR,
utype VARCHAR(7),
description VARCHAR,
is_remote BOOLEAN,
PRIMARY KEY (id),
UNIQUE (uid),
CONSTRAINT usertype CHECK (utype IN ('regular', 'service'))
);

CREATE TABLE configs (
id INTEGER DEFAULT unique_rowid() NOT NULL,
key VARCHAR,
value VARCHAR,
PRIMARY KEY (id),
UNIQUE (key)
);

CREATE TABLE user_groups (
user_id INTEGER NOT NULL,
group_id INTEGER NOT NULL,
PRIMARY KEY (user_id, group_id),
UNIQUE (user_id, group_id)
);

CREATE TABLE aces (
id INTEGER DEFAULT unique_rowid() NOT NULL,
user_id INTEGER,
group_id INTEGER,
resource_id INTEGER,
rid VARCHAR,
resource_description VARCHAR,
gid VARCHAR,
actions TEXT,
PRIMARY KEY (id),
CONSTRAINT user_resource_unique UNIQUE (user_id, resource_id),
CONSTRAINT group_resource_unique UNIQUE (group_id, resource_id)
);

CREATE INDEX aces_resource_id_asc ON aces (resource_id ASC);

CREATE INDEX aces_user_id_rid_actions_resource_description_asc ON aces (user_id ASC, rid ASC, actions ASC, resource_description ASC);

CREATE INDEX aces_group_id_asc_resource_id_asc_actions_asc ON aces (group_id ASC, resource_id ASC, actions ASC);

CREATE INDEX aces_group_id_rid_actions_resource_description_gid_asc ON aces (group_id ASC, rid ASC, actions ASC, resource_description ASC, gid ASC);

CREATE INDEX aces_user_id_asc_resource_id_asc_actions_asc ON aces (user_id ASC, resource_id ASC, actions ASC);

CREATE TABLE alembic_version (
version_num VARCHAR(32) NOT NULL,
CONSTRAINT alembic_version_pkc PRIMARY KEY (version_num)
);

INSERT INTO alembic_version (version_num) VALUES ('d3d471b75e78') RETURNING alembic_version.version_num;

INSERT INTO users (uid, passwordhash, utype, description, is_remote) VALUES ('bootstrapuser', '$6$rounds=656000$Ddm1dA4kl0J7gsdN$wmBHwxTBCMHxgYDZ8hSFifdOyQWVmPxgAQPUojKC1uFdR3kVXFFLmXlxIhk.vlwrJ7KqHX5qHOSwOebP9VwBK/', 'regular', 'bootstrapuser', 'false') RETURNING users.id;

RELEASE SAVEPOINT cockroach_restart;

COMMIT;
  • What did you expect to see?
    I expect a user in the users table if it exists, or no users and no tables:
./cockroach sql --insecure --database=testdb -e "select * from users"
# Server version: CockroachDB CCL v1.1.7 (linux amd64, built 2018/03/26 15:56:41, go1.8.3) (same version as client)
# Cluster ID: 69f52c38-6ee2-4455-99ff-240b00620ed3
+--------------------+---------------+--------------------------------------------------------------------------------------------------------------------------+---------+---------------+-----------+
|         id         |      uid      |                                                       passwordhash                                                       |  utype  |  description  | is_remote |
+--------------------+---------------+--------------------------------------------------------------------------------------------------------------------------+---------+---------------+-----------+
| 339363531251482625 | bootstrapuser | $6$rounds=656000$Ddm1dA4kl0J7gsdN$wmBHwxTBCMHxgYDZ8hSFifdOyQWVmPxgAQPUojKC1uFdR3kVXFFLmXlxIhk.vlwrJ7KqHX5qHOSwOebP9VwBK/ | regular | bootstrapuser | false     |
+--------------------+---------------+--------------------------------------------------------------------------------------------------------------------------+---------+---------------+-----------+
(1 row)
  • What did you see instead?
    No user in the database, but tables are present:
./cockroach sql --insecure --database=testdb -e "select * from users"
# Server version: CockroachDB CCL v1.1.7 (linux amd64, built 2018/03/26 15:56:41, go1.8.3) (same version as client)
# Cluster ID: ad6ab311-a538-400e-a479-9b47122d8dc4
+----+-----+--------------+-------+-------------+-----------+
| id | uid | passwordhash | utype | description | is_remote |
+----+-----+--------------+-------+-------------+-----------+
+----+-----+--------------+-------+-------------+-----------+
(0 rows)

./cockroach sql --insecure --database=testdb -e "show tables"
# Server version: CockroachDB CCL v1.1.7 (linux amd64, built 2018/03/26 15:56:41, go1.8.3) (same version as client)
# Cluster ID: ad6ab311-a538-400e-a479-9b47122d8dc4
+-----------------+
|      Table      |
+-----------------+
| aces            |
| alembic_version |
| configs         |
| groups          |
| resources       |
| user_groups     |
| users           |
+-----------------+
(7 rows)

@justinj
Copy link
Contributor

justinj commented Apr 13, 2018

Hi @gpaul,

Unfortunately, our DDL statements are not transactional today. This is definitely confusing as you've seen here, and it's something we'd like to address in the future. We have to allow issuing them within transactions for Postgres compatibility, since many tools issue them that way, even though they don't respect the bounds of said transaction. I'm not sure if there's a larger tracking issue for transactional DDL somewhere, I wasn't able to find one, @vivekmenezes, do you know if we have one?

@justinj justinj changed the title Transaction not atomic (or i'm using savepoints wrong) sql: DDL statements are non-transactional Apr 13, 2018
@vivekmenezes
Copy link
Contributor

this looks like the same issue as #24626. Please reopen if you think differently. Thanks!

@gpaul
Copy link
Author

gpaul commented Apr 13, 2018

Indeed that is confusing. Does that mean half of the CREATE TABLE statements may execute successfully while others fail silently? The part that confuses me is the silent failure. I issued INSERT statements followed immediately by RELEASE SAVEPOINT and COMMIT with no error returned, but the records don't appear in the database.

That is quite different from #24626 where the user got an error "pq: foreign key requires an existing index on columns ("parent_id")". In my case, the transaction succeeds successfully.

@gpaul
Copy link
Author

gpaul commented Apr 13, 2018

I can also confirm that if I remove the ROLLBACK TO SAVEPOINT statement then everything works as expected: the schema is correctly created and the data correctly inserted.

@petermattis petermattis reopened this Apr 13, 2018
@petermattis
Copy link
Collaborator

@vivekmenezes I think there is something different going on than in #24626. Inserting into a table in the same transaction the table is created in should work. For example:

root@localhost:26257/t> begin; create table t (k INT); insert into t values (1); commit; select * from t;
+---+
| k |
+---+
| 1 |
+---+
(1 row)

I imagine the ROLLBACK TO SAVEPOINT is fouling something up.

@petermattis petermattis added this to the 2.0.x milestone Apr 13, 2018
@vivekmenezes
Copy link
Contributor

I can confirm that this problem is not on 2.0

@vivekmenezes
Copy link
Contributor

and I can confirm that this problem is present on 1.1.7

@vivekmenezes
Copy link
Contributor

@gpaul would you like us to figure out what the problem with 1.1 is regarding this issue or are you comfortable that it's not an issue in 2.0+ ? Thanks!

vivekmenezes added a commit to vivekmenezes/cockroach that referenced this issue Apr 16, 2018
@gpaul
Copy link
Author

gpaul commented Apr 16, 2018

Responded in DM

@jgehrcke
Copy link

Hey @vivekmenezes! We would certainly appreciate to have this issue addressed in the 1.1 release branch as soon as possible. Given the number of issues we ran into in the 1.0 and 1.1 branches we are not comfortable updating to 2.0 just like that -- it appears to be an entirely new thing and it will take us a while to build up trust in its reliability.

For now, we would like to continue to use 1.1.x to not throw away what we've learned from using it in production.

@vivekmenezes
Copy link
Contributor

@andreimatei assigning to you as the sql executor guru

@gpaul
Copy link
Author

gpaul commented Apr 17, 2018

This does not appear to affect v1.0.6

@gpaul
Copy link
Author

gpaul commented Apr 17, 2018

It also affects v1.1.4

@andreimatei
Copy link
Contributor

smaller repro on 1.1 in our logictest language:

 LogicTest: default

statement ok
BEGIN;

statement ok
SAVEPOINT cockroach_restart;

statement ok
CREATE TABLE users (
uid VARCHAR PRIMARY KEY
);

statement ok
INSERT INTO users (uid) VALUES ('foo');

query T
select uid from users
----
foo

statement ok
ROLLBACK TO SAVEPOINT cockroach_restart;

statement ok
CREATE TABLE users (
uid VARCHAR PRIMARY KEY
);

statement ok
INSERT INTO users (uid) VALUES ('bar');

statement ok
RELEASE SAVEPOINT cockroach_restart;

statement ok
COMMIT;

query T
select uid from users
----
bar

@andreimatei
Copy link
Contributor

The issue is that, even after the ROLLBACK TO SAVEPOINT, we keep inserting into the wrong table (the one created before the ROLLBACK).
Investigating...

@andreimatei
Copy link
Contributor

andreimatei commented Apr 17, 2018

Even smaller repro; the following commits just fine and leaves a dangling row.

begin; savepoint cockroach_restart; create table t(x int primary key); rollback to savepoint cockroach_restart; 
insert into t(x) values(1); release savepoint cockroach_restart; commit;

I believe the problem is with the logic for resolving "uncommitted" tables - in 1.1 we didn't reset a session's set of uncommitted tables on transaction restart. Should have a fix soon.

@andreimatei
Copy link
Contributor

Well, here's one... I can fix the "dangling row" bug, but while testing that I've found another issue that's also present at master. The following sequence deadlocks:

begin; savepoint cockroach_restart; create table t(x int); rollback to savepoint cockroach_restart;
select * from t;

The reason why it deadlocks is that the create left an intent on something (probably a table name record); the intent will never commit because txn's epoch is incremented by the rollback to savepoint, but it still exists. The select then tries to resolve that name in order to get a lease on that table - and that name resolution process uses a different txn which gets blocked on the intent.
Funny enough, if you do another create after the rollback, that works - probably because all reads/writes are done using the main txn in that case.

I guess the deadlock is better than the "dangling row" state of affairs, so I guess I'll bring 1.1 up to par with 2.0 and open a different issue for the deadlock.

andreimatei added a commit to andreimatei/cockroach that referenced this issue Apr 17, 2018
…epoint

Prior to this patch, the following sequence would write a row that's
then unaccessible:
begin; savepoint cockroach_restart; create table t(x int primary key); rollback to savepoint cockroach_restart;
insert into t(x) values(1); release savepoint cockroach_restart; commit;

The insert manages to resolve the table because we weren't clearing the
"uncommitted descriptors" set that the session maintains. Since the
table descriptor doesn't end up being committed, the row will be
essentially unreachable.

This patch is original work on the 1.1 branch. This bug is not present
in the 2.0 release, where we're better about reseting state on
transaction retries.
I believe on 1.1 there's still a similar issue present when doing
automatic retries (as opposed to user-directed ones through rollback to
savepoint). I think fixing those would be more involved because it's
less clear where to stick the cleanup done in this patch; I'd rather not
do anything this point.

Also note that, with the fix, the sequence above doesn't actually work.
Instead, it deadlocks, just like it does on 2.0, because of cockroachdb#24885.
However, the following works:
begin; savepoint cockroach_restart; create table t(x int primary key); rollback to savepoint cockroach_restart;
create table t(x int primary key); insert into t(x) values(1); release savepoint cockroach_restart; commit;

Fixes cockroachdb#24785
craig bot pushed a commit that referenced this issue Apr 17, 2018
24888: sql: Fix "dangling rows" left over by schema change + rollback to savepoint r=andreimatei a=andreimatei

Prior to this patch, the following sequence would write a row that's
then unaccessible:
begin; savepoint cockroach_restart; create table t(x int primary key); rollback to savepoint cockroach_restart;
insert into t(x) values(1); release savepoint cockroach_restart; commit;

The insert manages to resolve the table because we weren't clearing the
"uncommitted descriptors" set that the session maintains. Since the
table descriptor doesn't end up being committed, the row will be
essentially unreachable.

This patch is original work on the 1.1 branch. This bug is not present
in the 2.0 release, where we're better about reseting state on
transaction retries.
I believe on 1.1 there's still a similar issue present when doing
automatic retries (as opposed to user-directed ones through rollback to
savepoint). I think fixing those would be more involved because it's
less clear where to stick the cleanup done in this patch; I'd rather not
do anything this point.

Also note that, with the fix, the sequence above doesn't actually work.
Instead, it deadlocks, just like it does on 2.0, because of #24885.
However, the following works:
begin; savepoint cockroach_restart; create table t(x int primary key); rollback to savepoint cockroach_restart;
create table t(x int primary key); insert into t(x) values(1); release savepoint cockroach_restart; commit;

Fixes #24785

cc @cockroachdb/release  @gpaul 

Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
@jgehrcke
Copy link

Thanks for the great work.

I've found another issue that's also present at master

Does that mean it's present in 1.1.7 or only on master?

If it is present in 1.1.7: how likely are we going to run into that after the 'dangling row' fix?

@jordanlewis
Copy link
Member

@andreimatei is this closed by #24888?

@andreimatei
Copy link
Contributor

@jgehrcke , as we were talking, #24885 is present on both 1.1.x and 2.0.x. But you don't run into it if either you do the same schema changes after a rollback to savepoint as you did before, or if someone else managed to do the equivalent schema changes (e.g. create the same table) - which would be the reason why one might not run the same schema changes.

@jordanlewis yes. A commit in that PR had "Fixes #24785"... I wonder why this was not closed? Does a bug only get closed when master gets the respective commit?

@bdarnell
Copy link
Contributor

A commit in that PR had "Fixes #24785"... I wonder why this was not closed? Does a bug only get closed when master gets the respective commit?

Yes. Only commits that get merged into master are scanned for Fixes #... lines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants