Skip to content
This repository has been archived by the owner on Aug 21, 2023. It is now read-only.

Benchmark against mydumper #4

Open
kennytm opened this issue Dec 8, 2019 · 2 comments
Open

Benchmark against mydumper #4

kennytm opened this issue Dec 8, 2019 · 2 comments
Labels

Comments

@kennytm
Copy link
Collaborator

kennytm commented Dec 8, 2019

Compare the speed of Dumpling on a database against Mydumper, on a large (> 10 GB) database.

Score

  • 450
@kennytm
Copy link
Collaborator Author

kennytm commented May 28, 2020

We should already have run several dump checks against Mydumper but we have never collected any concrete data. Anyway the timing is similar or faster.

@dveeden
Copy link
Contributor

dveeden commented Oct 14, 2021

Scope

Test the performance of TiDB Dumpling and similar tools for dumping a large table from a MySQL 8.0 server.

Setup

For this I used two AWS machines, one for running MySQL and another one for the dump process.

I used the c5d.xlarge instance type and provisioned it with the CentOS 7 AMI following the Linux OS version requirements that are in the docs.

The security group I created for this had 3 rules for inbound traffic:

  • SSH from my workstation
  • MySQL Classic protocol (3306/tcp) from the dump host
  • MySQL X protocol (33060/tcp) from the dump host

Machine setup

MySQL host

The first thing to do is to setup the NVMe device as storage for MySQL.

fdisk /dev/nvme1n1          # Create partition
mkfs.xfs /dev/nvme1n1p1     # Create filesystem
blkid                       # Get UUID
vi /etc/fstab               # Add entry for /var/lib/mysql
mkdir /var/lib/mysql
mount -a                    # Mount all filesystems

Entry to add for /etc/fstab:

UUID=8d67a32e-211b-4d72-be3f-f13f567a2ea6 /var/lib/mysql	xfs	defaults	0 0

Then we need to install MySQL Server

yum install https://dev.mysql.com/get/mysql80-community-release-el7-3.noarch.rpm
yum install -y mysql-community-server mysql-shell unzip perf sysstat
systemctl enable --now mysqld.service

MySQL sets up a random root password during the installation. This passwords needs to be changed for the account the be usable. For this we need to get the password from /var/log/mysqld.log and change it.

mysql -p
ALTER USER 'root'@'localhost' IDENTIFIED BY 'pashysVotubOcirgA-0';

And create ~/.my.cnf with this:

[client]
user=root
password="pashysVotubOcirgA-0"

[mysql]
prompt="\u@\h [\d] > "

Run mysqlsh --sql mysqlx://root@localhost/ and answer yes when asked to save passwords.

To configure MySQL:

SELECT 6.5*1024*1024*1024;
SET PERSIST local_infile=ON;
SET PERSIST innodb_buffer_pool_size=5*1024*1024*1024;
SET PERSIST innodb_io_capacity_max=200000;
SET PERSIST innodb_io_capacity=20000;
SET PERSIST_ONLY innodb_log_file_size=4*1024*1024*1024;
RESTART;

Now follow https://docs.pingcap.com/tidb/stable/import-example-data to:

  • Create the bikeshare.trips table.
  • Download the test data

To import the data we will use MySQL Shell's util.importTable.

mysqlsh mysql://root@localhost

importConfig = {
    "columns": [
        "duration",
        "start_date",
        "end_date",
        "start_station_number",
        "start_station",
        "end_station_number",
        "end_station",
        "bike_number",
        "member_type"
    ],
    "dialect": "csv",
    "schema": "bikeshare",
    "showprogress": true,
    "skipRows": 1,
    "table": "trips"
}
for (let f of ["2010-capitalbikeshare-tripdata.csv",
        "2013Q2-capitalbikeshare-tripdata.csv",
        "2015Q1-capitalbikeshare-tripdata.csv",
        "2016Q4-capitalbikeshare-tripdata.csv",
        "2011-capitalbikeshare-tripdata.csv",
        "2013Q3-capitalbikeshare-tripdata.csv",
        "2015Q2-capitalbikeshare-tripdata.csv",
        "2017Q1-capitalbikeshare-tripdata.csv",
        "2012Q1-capitalbikeshare-tripdata.csv",
        "2013Q4-capitalbikeshare-tripdata.csv",
        "2015Q3-capitalbikeshare-tripdata.csv",
        "2017Q2-capitalbikeshare-tripdata.csv",
        "2012Q2-capitalbikeshare-tripdata.csv",
        "2014Q1-capitalbikeshare-tripdata.csv",
        "2015Q4-capitalbikeshare-tripdata.csv",
        "2017Q3-capitalbikeshare-tripdata.csv",
        "2012Q3-capitalbikeshare-tripdata.csv",
        "2014Q2-capitalbikeshare-tripdata.csv",
        "2016Q1-capitalbikeshare-tripdata.csv",
        "2017Q4-capitalbikeshare-tripdata.csv",
        "2012Q4-capitalbikeshare-tripdata.csv",
        "2014Q3-capitalbikeshare-tripdata.csv",
        "2016Q2-capitalbikeshare-tripdata.csv",
        "2013Q1-capitalbikeshare-tripdata.csv",
        "2014Q4-capitalbikeshare-tripdata.csv",
        "2016Q3-capitalbikeshare-tripdata.csv"
    ]) {
    util.importTable(f, importConfig)
}

This results in a 2.52 GiB trips table:

root@localhost [bikeshare] > analyze table trips;
+-----------------+---------+----------+----------+
| Table           | Op      | Msg_type | Msg_text |
+-----------------+---------+----------+----------+
| bikeshare.trips | analyze | status   | OK       |
+-----------------+---------+----------+----------+
1 row in set (0.04 sec)

root@localhost [bikeshare] > SELECT table_name, engine, row_format, create_options, table_rows, format_bytes(data_length) 'Data size', format_bytes(index_length) 'Index size' FROM information_schema.tables WHERE table_name='trips';
+------------+--------+------------+----------------+------------+-----------+------------+
| TABLE_NAME | ENGINE | ROW_FORMAT | CREATE_OPTIONS | TABLE_ROWS | Data size | Index size |
+------------+--------+------------+----------------+------------+-----------+------------+
| trips      | InnoDB | Dynamic    |                |   18335533 | 2.52 GiB  |    0 bytes |
+------------+--------+------------+----------------+------------+-----------+------------+
1 row in set (0.00 sec)

Setup dump user and restrict access to the IP of the other host:

create user 'dump'@'172.31.0.201' identified by 'lotGiSluAfvoff-0';
grant all on bikeshare.* to 'dump'@'172.31.0.201';
grant reload, replication client on *.* to 'dump'@'172.31.0.201';
grant select on mysql.* to 'dump'@'172.31.0.201'; -- needed by util.DumpTables
grant BACKUP_ADMIN on *.* to 'dump'@'172.31.0.201'; -- needed by util.DumpTables

Dump host

This is the machine running Dumpling, MyDumper, mysqldump, etc.

fdisk /dev/nvme1n1          # Create partition
mkfs.xfs /dev/nvme1n1p1     # Create filesystem
blkid                       # Get UUID
vi /etc/fstab               # Add entry for /data
mkdir /data
mount -a                    # Mount all filesystems

Entry to add for /etc/fstab:

UUID=b13be231-0893-479f-b898-0795a81a3a97 /data xfs defaults 0 0
yum install https://dev.mysql.com/get/mysql80-community-release-el7-3.noarch.rpm
yum install -y mysql-community-client mysql-shell sysstat perf

Create a /root/.my.cnf:

[client]
host=172.31.3.136
user=dump
password="lotGiSluAfvoff-0"

[mysql]
prompt="\u@\h [\d] > "

Install tiup and dumpling

curl --proto '=https' --tlsv1.2 -sSf https://tiup-mirrors.pingcap.com/install.sh | sh 
source /root/.bash_profile
tiup install dumpling

Install mydumper

yum install https://github.com/maxbube/mydumper/releases/download/v0.10.7-2/mydumper-0.10.7-2.el7.x86_64.rpm

Test

mysqldump

cd /data
time mysqldump --no-tablespaces --single-transaction bikeshare trips > trips.sql
real	0m40.570s
user	0m29.411s
sys	0m3.291s

TiDB Dumpling

cd /data
time tiup dumpling -- -h 172.31.3.136 -u dump -P 3306 -p "lotGiSluAfvoff-0"
real	0m31.061s
user	0m33.734s
sys	0m4.027s

The query it is running is this:

SELECT * FROM `bikeshare`.`trips` ORDER BY `trip_id`

Doesnt' look like it does anything in parallel by default.

After adding -r 1000000 it looks like this:

| 260 | dump            | ip-172-31-0-201.eu-central-1.compute.internal:59740 | NULL | Query   |     3 | executing              | SELECT * FROM `bikeshare`.`trips` WHERE (`trip_id` >= 16495697 AND `trip_id` < 17673961) ORDER BY `t |
| 261 | dump            | ip-172-31-0-201.eu-central-1.compute.internal:59742 | NULL | Query   |     1 | Sending to client      | SELECT * FROM `bikeshare`.`trips` WHERE (`trip_id` >= 18852225 AND `trip_id` < 20030489) ORDER BY `t |
| 262 | dump            | ip-172-31-0-201.eu-central-1.compute.internal:59744 | NULL | Query   |     0 | Sending to client      | SELECT * FROM `bikeshare`.`trips` WHERE (`trip_id` >= 20030489 AND `trip_id` < 21208753) ORDER BY `t |
| 263 | dump            | ip-172-31-0-201.eu-central-1.compute.internal:59746 | NULL | Query   |     3 | executing              | SELECT * FROM `bikeshare`.`trips` WHERE (`trip_id` >= 17673961 AND `trip_id` < 18852225) ORDER BY `t |

And the time looks like this:

real	0m15.837s
user	0m51.461s
sys	0m4.007s

MySQL Shell util.dumpTables

time mysqlsh mysqlx://dump@172.31.3.136/bikeshare -- util dump-tables bikeshare trips --outputUrl=/data/dump1
real	0m9.876s
user	0m34.692s
sys	0m3.489s

Here we can see multiple threads

| 179 | dump            | ip-172-31-0-201.eu-central-1.compute.internal:59570 | bikeshare | Query   |     1 | executing              | SELECT SQL_NO_CACHE `trip_id`,`duration`,`start_date`,`end_date`,`start_station_number`,`start_station`,`end_station_number`,`end_station`,`bike_number`,`member_type` FROM `bikeshare`.`trips` WHERE (`trip_id` BETWEEN 9057855 AND 9310338) ORDER BY `trip_id` /* mysqlsh dumpTables, dumping table `bikeshare`.`trips`, chunk ID: 37 */ |
| 180 | dump            | ip-172-31-0-201.eu-central-1.compute.internal:59572 | bikeshare | Query   |     0 | executing              | SELECT SQL_NO_CACHE `trip_id`,`duration`,`start_date`,`end_date`,`start_station_number`,`start_station`,`end_station_number`,`end_station`,`bike_number`,`member_type` FROM `bikeshare`.`trips` WHERE (`trip_id` BETWEEN 9499702 AND 9815306) ORDER BY `trip_id` /* mysqlsh dumpTables, dumping table `bikeshare`.`trips`, chunk ID: 39 */ |
| 181 | dump            | ip-172-31-0-201.eu-central-1.compute.internal:59574 | bikeshare | Query   |     1 | executing              | SELECT SQL_NO_CACHE `trip_id`,`duration`,`start_date`,`end_date`,`start_station_number`,`start_station`,`end_station_number`,`end_station`,`bike_number`,`member_type` FROM `bikeshare`.`trips` WHERE (`trip_id` BETWEEN 8836932 AND 9057854) ORDER BY `trip_id` /* mysqlsh dumpTables, dumping table `bikeshare`.`trips`, chunk ID: 36 */ |
| 182 | dump            | ip-172-31-0-201.eu-central-1.compute.internal:59576 | bikeshare | Query   |     1 | executing              | SELECT SQL_NO_CACHE `trip_id`,`duration`,`start_date`,`end_date`,`start_station_number`,`start_station`,`end_station_number`,`end_station`,`bike_number`,`member_type` FROM `bikeshare`.`trips` WHERE (`trip_id` BETWEEN 9310339 AND 9499701) ORDER BY `trip_id` /* mysqlsh dumpTables, dumping table `bikeshare`.`trips`, chunk ID: 38 */ |

The output is compressed with zstd, we can use uncompressed by adding --compression=none

Without compresssion:

real	0m8.918s
user	0m22.080s
sys	0m6.500s

mydumper

time mydumper -B bikeshare -h 172.31.3.136 -u dump -p "lotGiSluAfvoff-0" -v 3
SELECT /*!40001 SQL_NO_CACHE */ * FROM `bikeshare`.`trips`
real	0m16.987s
user	0m13.057s
sys	0m2.857s

After adding -r 1000000 it shows:

| 253 | dump            | ip-172-31-0-201.eu-central-1.compute.internal:59720 | bikeshare | Query   |     9 | Sending to client      | SELECT /*!40001 SQL_NO_CACHE */ * FROM `bikeshare`.`trips` WHERE (`trip_id` >= 7069582 AND `trip_id` |
| 254 | dump            | ip-172-31-0-201.eu-central-1.compute.internal:59722 | NULL      | Query   |     9 | Sending to client      | SELECT /*!40001 SQL_NO_CACHE */ * FROM `bikeshare`.`trips` WHERE (`trip_id` >= 4713055 AND `trip_id` |
| 255 | dump            | ip-172-31-0-201.eu-central-1.compute.internal:59724 | NULL      | Query   |     9 | Sending to client      | SELECT /*!40001 SQL_NO_CACHE */ * FROM `bikeshare`.`trips` WHERE `trip_id` IS NULL OR (`trip_id` >=  |
| 256 | dump            | ip-172-31-0-201.eu-central-1.compute.internal:59726 | NULL      | Query   |     9 | Sending to client      | SELECT /*!40001 SQL_NO_CACHE */ * FROM `bikeshare`.`trips` WHERE (`trip_id` >= 2356528 AND `trip_id` |

and:

real	0m31.230s
user	0m13.871s
sys	0m2.147s

Test with more data

I generated more data by running this multiple times:

select max(trip_id) from trips into @offset;
insert into trips
  select @offset+trip_id, duration, start_date, end_date,start_station_number,
    start_station,end_station_number,end_station,bike_number,member_type from trips
    limit 100000;

This resulted in a 13G /var/lib/mysql/bikeshare/trips.ibd file.

root@localhost [(none)] > analyze table bikeshare.trips;
+-----------------+---------+----------+----------+
| Table           | Op      | Msg_type | Msg_text |
+-----------------+---------+----------+----------+
| bikeshare.trips | analyze | status   | OK       |
+-----------------+---------+----------+----------+
1 row in set (0.09 sec)

root@localhost [(none)] > SELECT table_name, engine, row_format, create_options, table_rows, format_bytes(data_length) 'Data size', format_bytes(index_length) 'Index size' FROM information_schema.tables WHERE table_name='trips';
+------------+--------+------------+----------------+------------+-----------+------------+
| TABLE_NAME | ENGINE | ROW_FORMAT | CREATE_OPTIONS | TABLE_ROWS | Data size | Index size |
+------------+--------+------------+----------------+------------+-----------+------------+
| trips      | InnoDB | Dynamic    |                |  103256322 | 12.22 GiB |    0 bytes |
+------------+--------+------------+----------------+------------+-----------+------------+
1 row in set (0.00 sec)

Now dumpling caused the disk on the dumpling host to go to 100% util. Adding --compress gzip.

With compression (CPU bound):

real	3m3.515s
user	11m48.139s
sys	0m18.111s

Without compression (I/O bound):

real	2m45.606s
user	3m31.813s
sys	0m19.199s

Testing mydumper:

time mydumper -B bikeshare -h 172.31.3.136 -u dump -p "lotGiSluAfvoff-0" -v 3 -r 1000000
real	2m42.499s
user	1m21.695s
sys	0m11.315s

And with --compress added:

real	2m45.243s
user	10m29.829s
sys	0m14.879s

And with time mysqlsh mysqlx://dump@172.31.3.136/bikeshare -- util dump-tables bikeshare trips --outputUrl=/data/dump1:

real	1m0.518s
user	2m54.040s
sys	0m18.658s

And with time mysqlsh mysqlx://dump@172.31.3.136/bikeshare -- util dump-tables bikeshare trips --outputUrl=/data/dump2 --compression=none:

real	2m22.727s
user	1m25.874s
sys	0m19.391s

versions

[root@ip-172-31-0-201 ~]# tiup -v
1.6.0 tiup
Go Version: go1.17.1
Git Ref: v1.6.0
GitHash: 0295c073463edc68696542f4ec9c9f5b4c07c5c1
[root@ip-172-31-0-201 ~]# tiup dumpling -V
Starting component `dumpling`: /root/.tiup/components/dumpling/v5.2.1/dumpling -V
Release version: v5.2.1
Git commit hash: 4e012e54ce89fab9551e7ff58ebcffe57590244a
Git branch:      heads/refs/tags/v5.2.1
Build timestamp: 2021-09-08 02:31:43Z
Go version:      go version go1.16.4 linux/amd64

[root@ip-172-31-0-201 ~]# mydumper -V
mydumper 0.10.7, built against MySQL 5.7.34-37
[root@ip-172-31-0-201 ~]# mysqlsh --version
mysqlsh   Ver 8.0.26 for Linux on x86_64 - for MySQL 8.0.26 (MySQL Community Server (GPL))
[root@ip-172-31-3-136 ~]# mysqld --version
/usr/sbin/mysqld  Ver 8.0.26 for Linux on x86_64 (MySQL Community Server - GPL)

Follow-ups

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants