-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-20.2: sql: failure to upgrade FK representation during table validation produces spurious errors and makes table unavailable #57032
Comments
Hello, I am Blathers. I am here to help you get the issue triaged. It looks like you have not filled out the issue in the format of any of our templates. To best assist you, we advise you to use one of these templates. I have CC'd a few people who may be able to assist you:
If we have not gotten back to your issue within a few business days, you can try the following:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
This affects only the foreign keys that were created originally in the database when the database version was 19.2.0. All foreign keys created in subsequent upgrades work perfectly fine. I tried to manually add the constraint and drop the constraint with no luck:
|
Hi @halverneus, thanks for the report. The detail about 19.2 is extra helpful. This is symptomatic of "descriptor corruption", a known issue caused by validation for table metadata becoming more strict in 20.2 and thus uncovering pre-existing inconsistencies. Your data is safe, but we'll need to repair the corrupted descriptors to recover the data. Could you run Also, please send us a debug zip. Note that debug zips can have sensitive information, so please send it to us via email at sql-schema-team@cockroachlabs.com |
I ran it from inside one of the containers (I don't have an exposed CRDB port outside of the Docker Swarm cluster). This is what I got:
|
The debug.zip has just been sent with the reference to this issue number in the subject and the body. |
Hi @halverneus, we had a Google group misconfiguration before, but it's resolved now. Could you please re-send your email. |
It has been resent. Thanks! |
The issue turns out to be that in 20.2, we introduced a bug in transforming on-disk descriptors to a new representation that had been introduced in 19.2. We're passing in a cockroach/pkg/sql/alter_table.go Line 578 in 150c591
Since the To fix this for 20.2 in a minimally invasive way, the plan is to extend |
@halverneus We'll get back to you within a day or so about fixing this for your cluster specifically. |
Hi @lucy-zhang, please add branch-* labels to identify which branch(es) this release-blocker affects. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
I'm afraid I didn't. I've just been monitoring releases going to Docker Hub and changing the version in my Docker Swarm Compose files (and testing on another stack), so I wasn't aware there were any instructions before now. That is unfortunate, on my part. |
Hi @halverneus, we've merged a fix for this issue into our master branch and into our 20.2 release branch #57083. There will be a nightly build available tonight, and I'll update this issue when it's available. I highly recommend waiting for that nightly build. |
Thanks! I'll look forward to updating in the morning. I'm not seeing a nightly build tag on Docker Hub, so I'm guessing I'll need to build my own Docker image for the time being. I think I can manage that just fine. |
FYI, we decided to include this critical fix in the official 20.2.2 version, which will very likely be released tomorrow. We'll include a link to the release binary, along with its Docker image, when it's available. |
Awesome! Even better! Thank you so very much! |
I just deployed the new release. Somethings have started working, but some of the foreign keys are giving me a slightly different error (perhaps I just need to wait for everything to flush?).
|
I've just checked doing a select on all tables and the only tables that are affected (if you still have my debug.zip) create 'missing fk forward reference' errors (never 'back reference', now, and only on the first half of the first schema upgrade appears affected, so I don't understand the specific cause, in this case). The following are the only remaining errors I have:
I can SELECT on all other tables from all other schema version without any issues. Thoughts? Any way to manually upgrade an 'un-upgraded foreign key'? |
Changes made to a Docker image that was emailed directly to me fixed the issue. |
Thanks, @halverneus. To close the loop, the PR #57133 corrected this issue. This resolution will be available in v20.2.3. |
If anyone finds this later and needs an urgent resolution before v20.2.3 is released, please see the binaries in this forum post: https://forum.cockroachlabs.com/t/issue-whenever-a-requests-comes-in-on-cockroachdb-20-2/4116/5 |
In case someone needs patched docker image 😊 FROM cockroachdb/cockroach:v20.2.2
RUN microdnf install tar gzip \
&& mkdir -p /tmp/patch \
&& cd /tmp/patch/ \
&& curl -O https://cockroach-builds.s3.amazonaws.com/cockroach-v20.2.2-2-gb91c2e7506.linux-amd64.tgz \
&& tar --strip-components=1 -xzf *.tgz \
&& microdnf remove gzip tar \
&& cp /tmp/patch/cockroach /cockroach/ \
&& rm -rf /tmp/patch |
I'm not sure if this is part of the upgrade to 20.2.0 or some other reason that correlates in time, but I seem to have suddenly lost all foreign key references on our production servers.
Please describe the issue you observed, and any steps we can take to reproduce it:
Make a database using an older version and upgrade. This was noticed some time after the upgrade.
What did you do? Describe in your own words.
If possible, provide steps to reproduce the behavior:
Expected behavior
Return results from query like normal.
Additional data / screenshots
root@:26257/fini> SELECT * from backup
-> ;
ERROR: internal error: missing fk back reference "fk_machine_id_ref_machine" to "backup" from "machine"
SQLSTATE: XX000
DETAIL: stack trace:
/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/tabledesc/structured.go:1500: validateCrossReferences()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/tabledesc/structured.go:1440: Validate()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/catalogkv/catalogkv.go:298: unwrapDescriptor()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/catalogkv/catalogkv.go:160: GetDescriptorByID()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:215: func1()
/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:707: func1()
/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn.go:811: exec()
/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:706: Txn()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:193: acquire()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:858: func1()
/go/src/github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight/singleflight.go:128: doCall()
/usr/local/go/src/runtime/asm_amd64.s:1357: goexit()
HINT: You have encountered an unexpected error.
Please check the public issue tracker to check whether this problem is
already tracked. If you cannot find it there, please report the error
with details by creating a new issue.
If you would rather not post publicly, please contact us directly
using the support form.
We appreciate your feedback.
root@:26257/fini> SELECT * FROM machine;
ERROR: internal error: missing fk back reference "fk_machine_type_ref_machine_type" to "machine" from "machine_type"
SQLSTATE: XX000
DETAIL: stack trace:
/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/tabledesc/structured.go:1500: validateCrossReferences()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/tabledesc/structured.go:1440: Validate()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/catalogkv/catalogkv.go:298: unwrapDescriptor()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/catalogkv/catalogkv.go:160: GetDescriptorByID()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:215: func1()
/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:707: func1()
/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn.go:811: exec()
/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:706: Txn()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:193: acquire()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:858: func1()
/go/src/github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight/singleflight.go:128: doCall()
/usr/local/go/src/runtime/asm_amd64.s:1357: goexit()
HINT: You have encountered an unexpected error.
Please check the public issue tracker to check whether this problem is
already tracked. If you cannot find it there, please report the error
with details by creating a new issue.
If you would rather not post publicly, please contact us directly
using the support form.
We appreciate your feedback.
root@:26257/fini> SELECT * FROM machine_type;
ERROR: internal error: missing fk forward reference "fk_machine_type_ref_machine_type" to "machine_type" from "machine"
SQLSTATE: XX000
DETAIL: stack trace:
/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/tabledesc/structured.go:1519: validateCrossReferences()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/tabledesc/structured.go:1440: Validate()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/catalogkv/catalogkv.go:298: unwrapDescriptor()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/catalogkv/catalogkv.go:160: GetDescriptorByID()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:215: func1()
/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:707: func1()
/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn.go:811: exec()
/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:706: Txn()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:193: acquire()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:858: func1()
/go/src/github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight/singleflight.go:128: doCall()
/usr/local/go/src/runtime/asm_amd64.s:1357: goexit()
HINT: You have encountered an unexpected error.
Please check the public issue tracker to check whether this problem is
already tracked. If you cannot find it there, please report the error
with details by creating a new issue.
If you would rather not post publicly, please contact us directly
using the support form.
We appreciate your feedback.
If a node in your cluster encountered a fatal error, supply the contents of the
log directories (at minimum of the affected node(s), but preferably all nodes).
Note that log files can contain confidential information. Please continue
creating this issue, but contact support@cockroachlabs.com to submit the log
files in private.
If applicable, add screenshots to help explain your problem.
Environment:
cockroach sql
Additional context
What was the impact?
I'm unable to run queries against our production servers.
The text was updated successfully, but these errors were encountered: