Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PXB-3034 Reduce the time the instance remains under lock #1530

Open
wants to merge 46 commits into
base: reducedlock-trunk
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
dec6cab
PXB-3034 - Make --lock-ddl option an ENUM
altmannmarcelo Nov 6, 2023
464c7b6
PXB-3034 - Add DDL tracking to xtrabackup
altmannmarcelo Nov 6, 2023
15d6004
PXB-3034 - Handle prepare
altmannmarcelo Nov 28, 2023
c3e0dd6
PXB-3034 - Second phase copy Multi thread
altmannmarcelo Nov 30, 2023
3d34e05
PXB-3034 - Adding test cases
altmannmarcelo Nov 6, 2023
756795d
PXB-3034 - adjust fil_open_for_xtrabackup
altmannmarcelo Dec 20, 2023
56a5adb
modifications
aybek Feb 15, 2024
0f5401d
handle renames during scan
aybek Feb 28, 2024
bedc607
We want to note that table will be copied only after it has been open…
aybek Mar 6, 2024
00ecb25
PXB-3113 : Improve debug sync framework to allow PXB to pause and res…
satya-bodapati Apr 2, 2024
7a81fd0
PXB-3252 : Xtrabackup failed to read page after 10 retries. File ./my…
satya-bodapati Apr 5, 2024
4e50fe9
PXB-3246 : Assertion failure: log0recv.cc:2141:!page || fil_page_type…
satya-bodapati Apr 5, 2024
3925f5b
PXB-3253 : [ERROR] [MY-012592] [InnoDB] Operating system error number…
satya-bodapati Apr 18, 2024
121b023
PXB-3223 : PXB must not allow --lock-ddl=REDUCED when pagetracking is…
satya-bodapati Apr 19, 2024
e71924a
PXB-3120 : Assertion failure: Dir_Walker::is_directory
satya-bodapati Apr 22, 2024
80ea4bb
PXB-3278 : Wrong parsing of MLOG_FILE_ redo log records with lock-ddl…
satya-bodapati Apr 25, 2024
1d1607f
Merge pull request #1552 from satya-bodapati/dev-reducedlock
satya-bodapati Apr 25, 2024
d60a02c
PXB-3281 : With lock-ddl=REDUCED, STL containers used by reduced code…
satya-bodapati Apr 25, 2024
b5f4017
Merge pull request #1553 from satya-bodapati/dev-reducedlock
satya-bodapati Apr 25, 2024
17121fe
PXB-3241 : Assertion failure: os0file.cc:3416:!exists while taking ba…
satya-bodapati Apr 26, 2024
36f8cbf
Merge pull request #1554 from satya-bodapati/dev-reducedlock
satya-bodapati Apr 26, 2024
d5a4245
PXB-3245 : Assertion failure: fil0fil.cc:2545:err == DB_SUCCESS found…
satya-bodapati Apr 26, 2024
1a6b48a
Merge pull request #1555 from satya-bodapati/dev-reducedlock
satya-bodapati Apr 26, 2024
594da1d
PXB-3280 : undo log truncation causes assertion failure with reduced …
satya-bodapati May 2, 2024
c0302e8
Merge pull request #1556 from satya-bodapati/dev-reducedlock
satya-bodapati May 2, 2024
e95cb93
PXB-3248 Multiple files found for the same tablespace ID
aybek May 15, 2024
4b9f897
Merge pull request #1557 from aybek/dev-reducedlock-trunk
aybek May 15, 2024
caa4040
PXB-3248 Multiple files found for the same tablespace ID
aybek May 15, 2024
23136ed
Merge pull request #1558 from aybek/dev-reducedlock-trunk
aybek May 15, 2024
42cf484
PXB-3248 - Multiple files found for the same tablespace ID
aybek May 29, 2024
0108546
Merge pull request #1566 from aybek/dev-reducedlock-trunk2
aybek May 29, 2024
b8373d5
PXB-3034: Bring back UNIV_DEBUG on debug-sync-thread.
satya-bodapati Jun 7, 2024
d86332a
Merge pull request #1570 from satya-bodapati/dev-reducedlock
satya-bodapati Jun 9, 2024
a8eb932
PXB-3320 : prepare_handle_del_files() fails to delete the .meta and .…
satya-bodapati Jul 5, 2024
fc66022
Merge pull request #1584 from satya-bodapati/dev-reducedlock
satya-bodapati Jul 15, 2024
c7cd826
PXB-3318 : prepare_handle_ren_files(): failed to handle .ren files
satya-bodapati Jul 16, 2024
1ad8fd9
Merge pull request #1585 from satya-bodapati/dev-reducedlock
satya-bodapati Jul 16, 2024
620cd29
PXB-3295 : Undo tablespaces are not tracked properly with lock-ddl=RE…
satya-bodapati Jul 17, 2024
49c8a34
PXB-3295: fix testcase
satya-bodapati Jul 18, 2024
2fe6a4d
Merge pull request #1586 from satya-bodapati/dev-reducedlock
satya-bodapati Jul 18, 2024
f55da87
Follow up fix for PXB-3318: handle rename source and destination as s…
satya-bodapati Jul 18, 2024
596352f
Merge pull request #1587 from satya-bodapati/dev-reducedlock
satya-bodapati Jul 18, 2024
3099f59
PXB-3221 : Assertion failure: page0cur.cc:1177:ib::fatal triggered du…
satya-bodapati Jul 18, 2024
d525604
Merge pull request #1588 from satya-bodapati/dev-reducedlock
satya-bodapati Jul 19, 2024
686c80c
PXB-3331 : Assertion failure: fil0fil.cc:6422:success
satya-bodapati Jul 22, 2024
3b41be9
Merge pull request #1590 from satya-bodapati/dev-reducedlock
satya-bodapati Jul 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
PXB-3252 : Xtrabackup failed to read page after 10 retries. File ./my…
…sql.ibd seems to be corrupted.

https://perconadev.atlassian.net/browse/PXB-3252

Problem:
--------
With lock-ddl=REDUCED, ALTER ENCRYPTION='Y'/'N' happens. On general tablespaces, this is done inplace.
ie the space_id of tablespace will not change and the pages are encrypted or decrypted.

For file per table tablespaces, a new tablespace is created with encryption key and data is copied from
old tablespace to new tablespace.

In xtrabackup, the files are discovered and then they are copied. Between these two operations, the encrypted
tablespace can change. For example, PXB saw that ts1.ibd is encrypted with key1, loaded into cache.

Then server did ENCRYPTION='N' and then back to ENCRYPTION='Y', now the tablspace is encrypted with a different key.

Now PXB copy threads tries to copy this tablespce and cannot decrypt a page. Page 0 is always unencrypted. So the
problem typically detected at Page 1. It can happen on any page.

Since PXB cannot decrypt the page, it reports corruption and aborts the backup.

Fix:
----
On decryption errors, we track such tablespaces with separate corrupted list. We also them to the recopy tables list.
Under lock, these tablespaces are copied again. A .new extension is used.
Then we process the corrupted list under lock. Create .corrupt files for the tablespaces from the corrupted list.
For example, if the tablespace encrypted is ts1.ibd, the file will be ts1.ibd.corrupted.

On prepare, we delete the corresponding ts1.ibd if the ts1.ibd.corrupted is present. This has to be done before the
*.ibd scan becuase tablespace loading aborts on processing such half-written tablespaces.
If the .corrupted is present in incremental directory, delete the ts1.ibd.meta and ts.ibd.delta files from the incremental
backup directory.
  • Loading branch information
satya-bodapati committed Apr 23, 2024
commit 7a81fd0933b7dc8e8617ce73a11b0a0c55c41188
25 changes: 22 additions & 3 deletions storage/innobase/xtrabackup/src/ddl_tracker.cc
Original file line number Diff line number Diff line change
Expand Up @@ -96,12 +96,12 @@ void ddl_tracker_t::backup_file_op(uint32_t space_id, mlog_id_t type,
<< " Name: " << new_space_name;
break;
case MLOG_INDEX_LOAD:
recopy_tables.insert(space_id);
add_to_recopy_tables(space_id);
xb::info() << "DDL tracking : LSN: " << start_lsn
<< " direct write on table ID: " << space_id;
break;
case MLOG_WRITE_STRING:
recopy_tables.insert(space_id);
add_to_recopy_tables(space_id);
xb::info() << "DDL tracking : LSN: " << start_lsn
<< " encryption operation on table ID: " << space_id;
break;
Expand All @@ -121,6 +121,18 @@ void ddl_tracker_t::add_table(const space_id_t &space_id, std::string name) {
tables_in_backup[space_id] = name;
}

void ddl_tracker_t::add_corrupted_tablespace(const space_id_t space_id,
const std::string &path) {
std::lock_guard<std::mutex> lock(m_ddl_tracker_mutex);

corrupted_tablespaces[space_id] = path;
}

void ddl_tracker_t::add_to_recopy_tables(space_id_t space_id) {
std::lock_guard<std::mutex> lock(m_ddl_tracker_mutex);
recopy_tables.insert(space_id);
}

void ddl_tracker_t::add_missing_table(std::string path) {
Fil_path::normalize(path);
if (Fil_path::has_prefix(path, Fil_path::DOT_SLASH)) {
Expand Down Expand Up @@ -205,13 +217,20 @@ void ddl_tracker_t::handle_ddl_operations() {
xb::info() << "DDL tracking : handling DDL operations";

if (new_tables.empty() && renames.empty() && drops.empty() &&
recopy_tables.empty()) {
recopy_tables.empty() && corrupted_tablespaces.empty()) {
xb::info()
<< "DDL tracking : Finished handling DDL operations - No changes";
return;
}
dberr_t err;

for (auto &tablespace : corrupted_tablespaces) {
/* Create .corrupt file extension with the filename. Prepare should delete
the corresponding .ibd, before doing *.ibd scan */
std::string &path = tablespace.second.append(".corrupt");
backup_file_printf(path.c_str(), "%s", "");
}

/* Some tables might get to the new list if the DDL happen in between
* redo_mgr.start and xb_load_tablespaces. This causes we ending up with two
* tablespaces with the same spaceID. Remove them from new tables */
Expand Down
22 changes: 22 additions & 0 deletions storage/innobase/xtrabackup/src/ddl_tracker.h
Original file line number Diff line number Diff line change
Expand Up @@ -41,12 +41,34 @@ class ddl_tracker_t {
* name */
space_id_to_name_t renamed_during_scan;

private:
/** Tables that cannot be decrypted during backup because of encryption
changes. Copy threads that cannot decrypt page, considers them as corrupted
page. Can happen only on general tablespaces and mysql.ibd */
std::unordered_map<space_id_t, std::string> corrupted_tablespaces;
/** Multiple copy threads can add entries to corrupted_tablespaces and
recopy_tables concurrently */
std::mutex m_ddl_tracker_mutex;

public:
/** Add a new table in the DDL tracker table list.
@param[in] space_id tablespace identifier
@param[in] name tablespace name */
void add_table(const space_id_t &space_id, std::string name);

/** Add a table to the corrupted tablespace list. The list is later
converted to tablespacename.ibd.corrupt files on disk
@param[in] space_id Tablespace id
@param[in] path Tablespace path */
void add_corrupted_tablespace(const space_id_t space_id,
const std::string &path);

/** Add a table to the recopy list. These tables are
1. had ADD INDEX while the backup is in progress
2. tablespace encryption change from 'y' to 'n' or viceversa
@param[in] space_id Tablespace id */
void add_to_recopy_tables(space_id_t space_id);

/** Report an operation to create, delete, or rename a file during backup.
@param[in] space_id tablespace identifier
@param[in] type redo log file operation type
Expand Down
2 changes: 1 addition & 1 deletion storage/innobase/xtrabackup/src/fil_cur.cc
Original file line number Diff line number Diff line change
Expand Up @@ -374,7 +374,7 @@ xb_fil_cur_result_t xb_fil_cur_read_from_offset(xb_fil_cur_t *cursor,
if (retry_count == 0) {
xb::error() << "failed to read page after 10 retries. File "
<< cursor->abs_path << " seems to be corrupted.";
ret = XB_FIL_CUR_ERROR;
ret = XB_FIL_CUR_CORRUPTED;
break;
}
xb::info() << "Database page corruption detected at page " << page_no
Expand Down
3 changes: 2 additions & 1 deletion storage/innobase/xtrabackup/src/file_utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,8 @@ typedef enum {
XB_FIL_CUR_SUCCESS,
XB_FIL_CUR_SKIP,
XB_FIL_CUR_ERROR,
XB_FIL_CUR_EOF
XB_FIL_CUR_EOF,
XB_FIL_CUR_CORRUPTED
} xb_fil_cur_result_t;

/* Holds the state needed to copy single data file. */
Expand Down
51 changes: 50 additions & 1 deletion storage/innobase/xtrabackup/src/xtrabackup.cc
Original file line number Diff line number Diff line change
Expand Up @@ -3254,7 +3254,9 @@ bool xtrabackup_copy_datafile(fil_node_t *node, uint thread_n,
}
}

if (res == XB_FIL_CUR_ERROR) {
if (res == XB_FIL_CUR_ERROR ||
(res == XB_FIL_CUR_CORRUPTED &&
(ddl_tracker == nullptr || opt_lock_ddl != LOCK_DDL_REDUCED))) {
goto error;
}

Expand All @@ -3278,6 +3280,13 @@ bool xtrabackup_copy_datafile(fil_node_t *node, uint thread_n,
if (write_filter && write_filter->deinit) {
write_filter->deinit(&write_filt_ctxt);
}

if (res == XB_FIL_CUR_CORRUPTED) {
if (ddl_tracker != nullptr) {
ddl_tracker->add_corrupted_tablespace(cursor.space_id, cursor.node->name);
}
}

return (rc);

error:
Expand Down Expand Up @@ -5665,6 +5674,39 @@ static bool prepare_handle_ren_files(
return false;
}

/** Handle .corrupt files. These files should be removed before we do *.ibd scan
@return true on success */
static bool prepare_handle_corrupt_files(
const datadir_entry_t &entry, /*!<in: datadir entry */
void * /*data*/) {
if (entry.is_empty_dir) return true;

std::string corrupt_path = entry.path;
Fil_path::normalize(corrupt_path);
// trim .corrupt
std::string ext = ".corrupt";
std::string source_path =
corrupt_path.substr(0, corrupt_path.length() - ext.length());

if (xtrabackup_incremental) {
std::string delta_file = source_path + ".delta";
xb::info() << "prepare_handle_corrupt_files: deleting " << delta_file;
os_file_delete_if_exists_func(delta_file.c_str(), nullptr);

std::string meta_file = source_path + ".meta";
xb::info() << "prepare_handle_corrupt_files: deleting " << meta_file;
os_file_delete_if_exists_func(meta_file.c_str(), nullptr);
} else {
xb::info() << "prepare_handle_corrupt_files: deleting " << source_path;
os_file_delete_if_exists_func(source_path.c_str(), nullptr);
}

// delete the .corrupt file, we don't need it anymore
os_file_delete_if_exists_func(corrupt_path.c_str(), nullptr);

return true;
}

/**
Handle DDL for deleted files
example input: test/10.ibd.del file
Expand Down Expand Up @@ -6872,6 +6914,13 @@ static void xtrabackup_prepare_func(int argc, char **argv) {

xb_normalize_init_values();

if (!xb_process_datadir(
xtrabackup_incremental_dir ? xtrabackup_incremental_dir : ".",
".corrupt", prepare_handle_corrupt_files, NULL)) {
xb_data_files_close();
goto error_cleanup;
}

Tablespace_map::instance().deserialize("./");

/* Handle `RENAME/DELETE` DDL files produced by DDL tracking during backup */
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
KEYRING_TYPE="component"
. inc/keyring_common.sh
. inc/keyring_file.sh

require_debug_pxb_version
require_debug_sync_thread

function run_test() {
ALL_TABLES_IN_BACKUP=$1

vlog "Running test with ALL_TABLES_IN_BACKUP=$ALL_TABLES_IN_BACKUP"

XB_ERROR_LOG=$topdir/backup_with_enc_general_tablespace.log
echo $XB_ERROR_LOG

configure_server_with_component
$MYSQL $MYSQL_ARGS -Ns -e "CREATE TABLESPACE ts1 ADD DATAFILE 'ts1.ibd' Engine=InnoDB ENCRYPTION = 'Y';CREATE TABLE test.enc_table (id INT PRIMARY KEY AUTO_INCREMENT) TABLESPACE ts1 ENCRYPTION='Y'; INSERT INTO test.enc_table VALUES (), (), (), ();" test
innodb_wait_for_flush_all

xtrabackup_background --backup --target-dir=$topdir/backup_enc_general_tablespace --debug-sync-thread="before_file_copy" --lock-ddl=REDUCED

job_pid=$XB_PID

wait_for_debug_sync_thread "before_file_copy"

echo "Now pause redo thread"
echo "xtrabackup_copy_logfile_pause" > $topdir/backup_enc_general_tablespace/xb_debug_sync_thread
kill -SIGUSR1 $job_pid

wait_for_debug_sync_thread "xtrabackup_copy_logfile_pause"

echo "Now re-encrypt the tablespace Y->N-Y"
$MYSQL $MYSQL_ARGS -Ns -e "ALTER TABLESPACE ts1 ENCRYPTION='N';ALTER TABLESPACE ts1 ENCRYPTION='Y';" test

echo "Now resume copying thread"
resume_debug_sync_thread "before_file_copy" $topdir/backup_enc_general_tablespace

echo "Now resume redo copy thread"
resume_debug_sync_thread "xtrabackup_copy_logfile_pause" $topdir/backup_enc_general_tablespace

wait $XB_PID
exit_status=$?

# Check the exit status and take appropriate action
if [ $exit_status -eq 0 ]; then
echo "xtrabackup_background exited successfully."
else
echo "xtrabackup_background exited with an error (exit status: $exit_status)."
exit 1
fi

$MYSQL $MYSQL_ARGS -Ns -e "INSERT INTO test.enc_table VALUES (), (), (), ();" test
$MYSQL $MYSQL_ARGS -Ns -e "CREATE TABLE t2(a INT); INSERT INTO t2 VALUES (1),(2),(3),(4),(5);" test
$MYSQL $MYSQL_ARGS -Ns -e "ALTER TABLESPACE mysql ENCRYPTION='Y';" test

XB_ERROR_LOG=$topdir/backup_inc.log
BACKUP_DIR=$topdir/backup_inc
xtrabackup_background --backup --target-dir=$topdir/backup_inc --incremental-basedir=$topdir/backup_enc_general_tablespace --lock-ddl=REDUCED --debug-sync-thread="before_file_copy"

wait_for_debug_sync_thread "before_file_copy"

echo "Now pause redo thread"
echo "xtrabackup_copy_logfile_pause" > $BACKUP_DIR/xb_debug_sync_thread
kill -SIGUSR1 $XB_PID

wait_for_debug_sync_thread "xtrabackup_copy_logfile_pause"

echo "Now re-encrypt the tablespace Y->N-Y"
$MYSQL $MYSQL_ARGS -Ns -e "ALTER TABLESPACE mysql ENCRYPTION='N';ALTER TABLESPACE mysql ENCRYPTION='Y';" test

echo "Now resume copying thread"
resume_debug_sync_thread "before_file_copy" $BACKUP_DIR

echo "Now resume redo copy thread"
resume_debug_sync_thread "xtrabackup_copy_logfile_pause" $BACKUP_DIR

echo "################reached waitpid#######################"
wait $XB_PID
exit_status=$?

# Check the exit status and take appropriate action
if [ $exit_status -eq 0 ]; then
echo "xtrabackup_background exited successfully."
else
echo "xtrabackup_background exited with an error (exit status: $exit_status)."
exit 1
fi

record_db_state test
stop_server
xtrabackup --prepare --apply-log-only --target-dir=$topdir/backup_enc_general_tablespace --xtrabackup-plugin-dir=${plugin_dir} ${keyring_args}
xtrabackup --prepare --target-dir=$topdir/backup_enc_general_tablespace --incremental-dir=$topdir/backup_inc --xtrabackup-plugin-dir=${plugin_dir} ${keyring_args}

rm -rf $mysql_datadir/*
xtrabackup --copy-back --target-dir=$topdir/backup_enc_general_tablespace --xtrabackup-plugin-dir=${plugin_dir} ${keyring_args}
cp ${instance_local_manifest} $mysql_datadir
cp ${keyring_component_cnf} $mysql_datadir

start_server
verify_db_state test
stop_server
rm -rf $mysql_datadir $topdir/backup_with_enc_general_tablespace.log $topdir/backup_enc_general_tablespace
}

run_test true