Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

May unexpectly run into storage.schema_version > tiflash_instance.schema_version #5397

Closed
JaySon-Huang opened this issue Jul 18, 2022 · 1 comment · Fixed by #5506
Closed
Assignees

Comments

@JaySon-Huang
Copy link
Contributor

JaySon-Huang commented Jul 18, 2022

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

After #5245, we may run into a corner case that storage.schema_version > TiDBSchemaSyncer.cur_version.

Background:
After the feature concurrent DDL, TiDB does update schema version before set schema diff, and they are done in separate transactions.
So TiFlash may see a schema version X but no schema diff X, meaning that the transaction of schema diff X has not been committed or has been aborted.
However, TiDB makes sure that if we get a schema version X, then the schema diff X-1 must exist. Otherwise the transaction of schema diff X-1 is aborted and we can safely ignore it.
Since TiDB can not make sure the schema diff of the latest schema version X is not empty, under this situation we should set the cur_version to X-1 and try to fetch the schema diff X next time.

However, even if the last schema diff X is empty, the schema builder is now still constructed with X but not X-1. So we will update table.schema_version to X but TiDBSchemaSyncer.cur_version to X-1. That will make some queries fail unexpectedly.

SchemaBuilder<Getter, NameMapper> builder(getter, context, databases, latest_version);
Int64 used_version = cur_version;
// First get all schema diff from `cur_version` to `latest_version`. Only apply the schema diff(s) if we fetch all
// schema diff without any exception.
std::vector<std::optional<SchemaDiff>> diffs;
while (used_version < latest_version)
{
used_version++;
diffs.push_back(getter.getSchemaDiff(used_version));
}
LOG_FMT_DEBUG(log, "End load schema diffs with total {} entries.", diffs.size());

void loadAllSchema(Getter & getter, Int64 version, Context & context)
{
SchemaBuilder<Getter, NameMapper> builder(getter, context, databases, version);
builder.syncAllSchema();
}

Codes snippets for updating storage.table_info.schema_version

SchemaBuilder(Getter & getter_, Context & context_, std::unordered_map<DB::DatabaseID, TiDB::DBInfoPtr> & dbs_, Int64 version)
: getter(getter_)
, context(context_)
, databases(dbs_)
, target_version(version)
, log(&Poco::Logger::get("SchemaBuilder"))
{}

orig_table_info.schema_version = target_version;
auto alter_lock = storage->lockForAlter(getThreadName());

/// Update schema version.
table_info->schema_version = target_version;

2. What did you expect to see? (Required)

  • For tryLoadSchemaDiffs
    We should first fetch the schema diff and check whether the schema diff of X is empty or not before creating the SchemaBuilder instance. If the schema diff of X is empty, we should set SchemaBuilder.target_version to be X-1
  • For loadAllSchema
    We should check the diff by getter.checkSchemaDiffExists before creating the SchemaBuilder instance. Not do the check after loadAllSchema is done.

And we should add some unit test cases

3. What did you see instead (Required)

4. What is your TiFlash version? (Required)

master

@JaySon-Huang JaySon-Huang added type/bug The issue is confirmed as a bug. component/storage labels Jul 18, 2022
@JaySon-Huang
Copy link
Contributor Author

JaySon-Huang commented Jul 18, 2022

As the time span between update schema version and set schema diff is relative small (usually less than 50ms), and user can workaround this bug by creating an empty table. Mark this bug to be moderate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants