diff --git a/.gitignore b/.gitignore index 728e6000141c9..069892e5fd170 100644 --- a/.gitignore +++ b/.gitignore @@ -7,3 +7,4 @@ *.iml out gen +.DS_Store diff --git a/FAQ.md b/FAQ.md index 6d3248f995a46..7ed57319ce837 100644 --- a/FAQ.md +++ b/FAQ.md @@ -1,5 +1,6 @@ --- title: TiDB FAQ +summary: Learn about the most frequently asked questions (FAQs) relating to TiDB. category: faq --- @@ -13,11 +14,11 @@ This document lists the Most Frequently Asked Questions about TiDB. #### What is TiDB? -TiDB is a distributed SQL database that features in horizontal scalability, high availability and consistent distributed transactions. It also enables you to use MySQL’s SQL syntax and protocol to manage and retrieve data. +TiDB is a distributed SQL database that features in horizontal scalability, high availability and consistent distributed transactions. It also enables you to use MySQL's SQL syntax and protocol to manage and retrieve data. #### What is TiDB's architecture? -The TiDB cluster has three components: the TiDB server, the PD (Placement Driver) server, and the TiKV server. For more details, see [TiDB architecture](overview/#tidb-architecture). +The TiDB cluster has three components: the TiDB server, the PD (Placement Driver) server, and the TiKV server. For more details, see [TiDB architecture](architecture.md). #### Is TiDB based on MySQL? @@ -39,7 +40,7 @@ Currently, TiDB supports the majority of MySQL 5.7 syntax, but does not support #### How is TiDB highly available? -TiDB is self-healing. All of the three components, TiDB, TiKV and PD, can tolerate failures of some of their instances. With its strong consistency guarantee, whether it’s data machine failures or even downtime of an entire data center, your data can be recovered automatically. For more information, see [High availability](overview.md#high-availability). +TiDB is self-healing. All of the three components, TiDB, TiKV and PD, can tolerate failures of some of their instances. With its strong consistency guarantee, whether it’s data machine failures or even downtime of an entire data center, your data can be recovered automatically. For more information, see [TiDB architecture](architecture.md). #### How is TiDB strongly consistent? @@ -55,11 +56,11 @@ Any language supported by MySQL client or driver. #### Can I use other Key-Value storage engines with TiDB? -Yes. Besides TiKV, TiDB supports many popular standalone storage engines, such as GolevelDB and BoltDB. If the storage engine is a KV engine that supports transactions and it provides a client that meets the interface requirement of TiDB, then it can connect to TiDB. +Yes. TiKV and TiDB support many popular standalone storage engines, such as GolevelDB and BoltDB. If the storage engine is a KV engine that supports transactions and it provides a client that meets the interface requirement of TiDB, then it can connect to TiDB. #### What's the recommended solution for the deployment of three geo-distributed data centers? -The architecture of TiDB guarantees that it fully supports geo-distribution and multi-activeness. Your data and applications are always-on. All the outages are transparent to your applications and your data can recover automatically. The operation depends on the network latency and stability. It is recommended to keep the latency within 5ms. Currently, we already have similar use cases. For details, contact info@pingcap.com. +The architecture of TiDB guarantees that it fully supports geo-distribution and multi-activeness. Your data and applications are always-on. All the outages are transparent to your applications and your data can recover automatically. The operation depends on the network latency and stability. It is recommended to keep the latency within 5ms. Currently, we already have similar use cases. For details, contact info@pingcap.com. #### Does TiDB provide any other knowledge resource besides the documentation? @@ -85,6 +86,46 @@ The character sets of TiDB use UTF-8 by default and currently only support UTF-8 5000 at most. +#### Does TiDB support XA? + +No. The JDBC drive of TiDB is MySQL JDBC (Connector/J). When using Atomikos, set the data source to `type="com.mysql.jdbc.jdbc2.optional.MysqlXADataSource"`. TiDB does not support the connection with MySQL JDBC XADataSource. MySQL JDBC XADataSource only works for MySQL (for example, using DML to modify the `redo` log). + +After you configure the two data sources of Atomikos, set the JDBC drives to XA. When Atomikos operates TM and RM (DB), Atomikos sends the command including XA to the JDBC layer. Taking MySQL for an example, when XA is enabled in the JDBC layer, JDBC will send a series of XA logic operations to InnoDB, including using DML to change the `redo` log. This is the operation of the two-phase commit. The current TiDB version does not support the upper application layer JTA/XA and does not parse XA operations sent by Atomikos. + +As a standalone database, MySQL can only implement across-database transactions using XA; while TiDB supports distributed transactions using Google Percolator transaction model and its performance stability is higher than XA, so TiDB does not support XA and there is no need for TiDB to support XA. + +#### Does `show processlist` display the system process ID? + +The display content of TiDB `show processlist` is almost the same as that of MySQL `show processlist`. TiDB `show processlist` does not display the system process ID. The ID that it displays is the current session ID. The differences between TiDB `show processlist` and MySQL `show processlist` are as follows: + +- As TiDB is a distributed database, the `tidb-server` instance is a stateless engine for parsing and executing the SQL statements (for details, see [TiDB architecture](architecture.md)). `show processlist` displays the session list executed in the `tidb-server` instance that the user logs in to from the MySQL client, not the list of all the sessions running in the cluster. But MySQL is a standalone database and its `show processlist` displays all the SQL statements executed in MySQL. +- TiDB `show processlist` displays the estimated memory usage (unit: Byte) of the current session, which is not displayed in MySQL `show processlist`. + +#### How to modify the user password and privilege? + +To modify the user password in TiDB, it is recommended to use `set password for 'root'@'%' = '0101001';` or `alter`, not `update mysql.user` which might lead to the condition that the password in other nodes is not refreshed timely. + +It is recommended to use the official standard statements when modifying the user password and privilege. For details, see [TiDB user account management](sql/user-account-management.md). + +#### Why does the auto-increment ID of the later inserted data is smaller than that of the earlier inserted data in TiDB? + +The auto-increment ID feature in TiDB is only guaranteed to be automatically incremental and unique but is not guaranteed to be allocated sequentially. Currently, TiDB is allocating IDs in batches. If data is inserted into multiple TiDB servers simultaneously, the allocated IDs are not sequential. When multiple threads concurrently insert data to multiple `tidb-server` instances, the auto-increment ID of the later inserted data may be smaller. TiDB allows specifying `AUTO_INCREMENT` for the integer field, but allows only one `AUTO_INCREMENT` field in a single table. For details, see [DDL](sql/ddl.md). + +#### How to modify the `sql_mode` in TiDB except using the `set` command? + +The configuration method of TiDB `sql_mode` is different from that of MySQL `sql_mode`. TiDB does not support using the configuration file to configure `sql\_mode` of the database; it only supports using the `set` command to configure `sql\_mode` of the database. You can use `set @@global.sql_mode = 'STRICT_TRANS_TABLES';` to configure it. + +#### What authentication protocols does TiDB support? What's the process? + +- Like MySQL, TiDB supports the SASL protocol for user login authentication and password processing. + +- When the client connects to TiDB, the challenge-response authentication mode starts. The process is as follows: + + 1. The client connects to the server. + 2. The server sends a random string challenge to the client. + 3. The client sends the username and response to the server. + 4. The server verifies the response. + ### TiDB techniques #### TiKV for data storage @@ -144,7 +185,15 @@ As a distributed cluster, TiDB has a high demand on time, especially for PD, bec ##### Is it feasible if we don't use RAID for SSD? -If the resources are adequate, it is recommended to use RAID for SSD. If the resources are inadequate, it is acceptable not to use RAID for SSD. +If the resources are adequate, it is recommended to use RAID 10 for SSD. If the resources are inadequate, it is acceptable not to use RAID for SSD. + +##### What's the recommended configuration of TiDB components? + +- TiDB has a high requirement on CPU and memory. If you need to open Binlog, the local disk space should be increased based on the service volume estimation and the time requirement for the GC operation. But the SSD disk is not a must. +- PD stores the cluster metadata and has frequent Read and Write requests. It demands a high I/O disk. A disk of low performance will affect the performance of the whole cluster. It is recommended to use SSD disks. In addition, a larger number of Regions has a higher requirement on CPU and memory. +- TiKV has a high requirement on CPU, memory and disk. It is required to use SSD. + +For details, see [TiDB software and hardware requirements](op-guide/recommendation.md). ### Install and deploy @@ -158,7 +207,7 @@ You need to set the `--config` parameter in TiKV/PD to make the `toml` configura ##### Should I deploy the TiDB monitoring framework (Prometheus + Grafana) on a standalone machine or on multiple machines? What is the recommended CPU and memory? -The monitoring machine is recommended to use standalone deployment. It is recommended to use a 8 core CPU with 16 GB+ memory and a 500 GB+ hard disk. +The monitoring machine is recommended to use standalone deployment. It is recommended to use an 8 core CPU with 16 GB+ memory and a 500 GB+ hard disk. ##### Why the monitor cannot display all metrics? @@ -184,7 +233,7 @@ Check the time difference between the machine time of the monitor and the time w | enable_firewalld | to enable the firewall, closed by default | | enable_ntpd | to monitor the NTP service of the managed node, True by default; do not close it | | machine_benchmark | to monitor the disk IOPS of the managed node, True by default; do not close it | -| set_hostname | to edit the hostname of the mananged node based on the IP, False by default | +| set_hostname | to edit the hostname of the managed node based on the IP, False by default | | enable_binlog | whether to deploy Pump and enable the binlog, False by default, dependent on the Kafka cluster; see the `zookeeper_addrs` variable | | zookeeper_addrs | the ZooKeeper address of the binlog Kafka cluster | | enable_slow_query_log | to record the slow query log of TiDB into a single file: ({{ deploy_dir }}/log/tidb_slow_query.log). False by default, to record it into the TiDB log | @@ -194,12 +243,57 @@ Check the time difference between the machine time of the monitor and the time w It is not recommended to deploy TiDB offline using Ansible. If the Control Machine has no access to external network, you can deploy TiDB offline using Ansible. For details, see [Offline Deployment Using Ansible](op-guide/offline-ansible-deployment.md). +#### How to deploy TiDB quickly using Docker Compose on a single machine? + +You can use Docker Compose to build a TiDB cluster locally, including the cluster monitoring components. You can also customize the version and number of instances for each component. The configuration file can also be customized. You can only use this deployment method for testing and development environment. For details, see [Building the Cluster Using Docker Compose](op-guide/docker-compose.md). + +#### How to separately record the slow query log in TiDB? How to locate the slow query SQL statement? + +1. The slow query definition for TiDB is in the `conf/tidb.yml` configuration file of `tidb-ansible`. The `slow-threshold: 300` parameter is used to configure the threshold value of the slow query (unit: millisecond). + + The slow query log is recorded in `tidb.log` by default. If you want to generate a slow query log file separately, set `enable_slow_query_log` in the `inventory.ini` configuration file to `True`. + + Then run `ansible-playbook rolling_update.yml --tags=tidb` to perform a rolling update on the `tidb-server` instance. After the update is finished, the `tidb-server` instance will record the slow query log in `tidb_slow_query.log`. + +2. If a slow query occurs, you can locate the `tidb-server` instance where the slow query is and the slow query time point using Grafana and find the SQL statement information recorded in the log on the corresponding node. + +#### How to add the `label` configuration if `label` of TiKV was not configured when I deployed the TiDB cluster for the first time? + +The configuration of TiDB `label` is related to the cluster deployment architecture. It is important and is the basis for PD to execute global management and scheduling. If you did not configure `label` when deploying the cluster previously, you should adjust the deployment structure by manually adding the `location-labels` information using the PD management tool `pd-ctl`, for example, `config set location-labels "zone, rack, host"` (you should configure it based on the practical `label` level name). + +For the usage of `pd-ctl`, see [PD Control Instruction](tools/pd-control.md). + +#### Why does the `dd` command for the disk test use the `oflag=direct` option? + +The Direct mode wraps the Write request into the I/O command and sends this command to the disk to bypass the file system cache and directly test the real I/O Read/Write performance of the disk. + +#### How to use the `fio` command to test the disk performance of the TiKV instance? + +- Random Read test: + + ``` + ./fio -ioengine=libaio -bs=32k -direct=1 -thread -rw=randread -size=10G -filename=fio_randread_test.txt -name='PingCAP' -iodepth=4 -runtime=60 + ``` + +- The mix test of sequential Write and random Read: + + ``` + ./fio -ioengine=libaio -bs=32k -direct=1 -thread -rw=randrw -percentage_random=100,0 -size=10G -filename=fio_randr_write_test.txt -name='PingCAP' -iodepth=4 -runtime=60 + ``` + +#### Error `UNREACHABLE! "msg": "Failed to connect to the host via ssh: " ` when deploying TiDB using TiDB-Ansible + +Two possible reasons and solutions: + +- The SSH mutual trust is not configured as required. It’s recommended to follow [the steps described in the official document](op-guide/ansible-deployment.md/#step-5-configure-the-ssh-mutual-trust-and-sudo-rules-on-the-control-machine) and check whether it is successfully configured using `ansible -i inventory.ini all -m shell -a 'whoami' -b`. +- If it involves the scenario where a single server is assigned multiple roles, for example, the mixed deployment of multiple components or multiple TiKV instances are deployed on a single server, this error might be caused by the SSH reuse mechanism. You can use the option of `ansible … -f 1` to avoid this error. + ### Upgrade #### How to perform rolling updates using Ansible? - Apply rolling updates to the TiKV node (only update the TiKV service). - + ``` ansible-playbook rolling_update.yml --tags=tikv ``` @@ -210,9 +304,9 @@ It is not recommended to deploy TiDB offline using Ansible. If the Control Machi ansible-playbook rolling_update.yml ``` -#### What is the effect of rolling udpates? +#### How are the rolling updates done? -When you apply rolling updates to TiDB services, the running application is not affected. You need to configure the minimum cluster topology (TiDB * 2, PD * 3, TiKV * 3). If the Pump/Drainer service is involved in the cluster, it is recommended to stop Drainer before rolling updates. When you update TiDB, Pump is also updated. +When you apply rolling updates to the TiDB services, the running application is not affected. You need to configure the minimum cluster topology (TiDB * 2, PD * 3, TiKV * 3). If the Pump/Drainer service is involved in the cluster, it is recommended to stop Drainer before rolling updates. When you update TiDB, Pump is also updated. #### How to upgrade when I deploy TiDB using Binary? @@ -302,9 +396,9 @@ Take `Release Version: v1.0.3-1-ga80e796` as an example of version number descri #### What's the difference between various TiDB master versions? How to avoid using the wrong TiDB-Ansible version? -The TiDB community is highly active. After the GA release, the engineers have been keeping optimizing and fixing bugs. Therefore, the TiDB version is updated quite fast. If you want to keep informed of the latest version, see [TiDB Weekly update](https://pingcap.com/weekly/). +The TiDB community is highly active. After the 1.0 GA release, the engineers have been keeping optimizing and fixing bugs. Therefore, the TiDB version is updated quite fast. If you want to keep informed of the latest version, see [TiDB Weekly update](https://pingcap.com/weekly/). -It is recommended to deploy the TiDB cluster using the latest version of TiDB-Ansible, which will also be updated along with the TiDB version. Besides, TiDB has a unified management of the version number after GA release. You can view the version number using the following two methods: +It is recommended to deploy the TiDB cluster using the latest version of TiDB-Ansible, which will also be updated along with the TiDB version. TiDB has a unified management of the version number after the 1.0 GA release. You can view the version number using the following two methods: - `select tidb_version()` - `tidb-server -V` @@ -380,19 +474,33 @@ The offline node usually indicates the TiKV node. You can determine whether the 2. Delete the `node_exporter` data of the corresponding node from the Prometheus configuration file. 3. Delete the data of the corresponding node from Ansible `inventory.ini`. +#### Why couldn't I connect to the PD server using `127.0.0.1` when I was using the PD Control? + +If your TiDB cluster is deployed using TiDB-Ansible, the PD external service port is not bound to `127.0.0.1`, so PD Control does not recognize `127.0.0.1` and you can only connect to it using the local IP address. + ### Manage the TiDB server #### How to set the `lease` parameter in TiDB? The lease parameter (`--lease=60`) is set from the command line when starting a TiDB server. The value of the lease parameter impacts the Database Schema Changes (DDL) speed of the current session. In the testing environments, you can set the value to 1s for to speed up the testing cycle. But in the production environments, it is recommended to set the value to minutes (for example, 60) to ensure the DDL safety. +#### What is the processing time of a DDL operation? + +The processing time is different for different scenarios. Generally, you can consider the following three scenarios: + +1. The `Add Index` operation with a relatively small number of rows in the corresponding data table: about 3s +2. The `Add Index` operation with a relatively large number of rows in the corresponding data table: the processing time depends on the specific number of rows and the QPS at that time (the `Add Index` operation has a lower priority than ordinary SQL operations) +3. Other DDL operations: about 1s + +If the TiDB server instance that receives the DDL request is the same TiDB server instance that the DDL owner is in, the first and third scenarios above may cost only dozens to hundreds of milliseconds. + #### Why it is very slow to run DDL statements sometimes? Possible reasons: - If you run multiple DDL statements together, the last few DDL statements might run slowly. This is because the DDL statements are executed serially in the TiDB cluster. - After you start the cluster successfully, the first DDL operation may take a longer time to run, usually around 30s. This is because the TiDB cluster is electing the leader that processes DDL statements. -- In rolling updates or shutdown updates, the processing time of DDL statements in the first ten minutes after starting TiDB is affected by the server stop sequence (stopping PD -> TiDB), and the condition where TiDB does not clean up the registration data in time because TiDB is stopped using the `kill -9` command. When you run DDL statements during this period, for the state change of each DDL, you need to wait for 2 * lease (lease = 10s). +- The processing time of DDL statements in the first ten minutes after starting TiDB would be much longer than the normal case if you meet the following conditions: 1) TiDB cannot communicate with PD as usual when you are stopping TiDB (including the case of power failure); 2) TiDB fails to clean up the registration data from PD in time because TiDB is stopped by the `kill -9` command. If you run DDL statements during this period, for the state change of each DDL, you need to wait for 2 * lease (lease = 45s). - If a communication issue occurs between a TiDB server and a PD server in the cluster, the TiDB server cannot get or update the version information from the PD server in time. In this case, you need to wait for 2 * lease for the state processing of each DDL. #### Can I use S3 as the backend storage engine in TiDB? @@ -417,6 +525,46 @@ The current TiDB version has no limit for the maximum number of concurrent conne The `create_time` of tables in the `information_schema` is the creation time. +#### What is the meaning of `EXPENSIVE_QUERY` in the TiDB log? + +When TiDB is executing a SQL statement, the query will be `EXPENSIVE_QUERY` if each operator is estimated to process over 10000 pieces of data. You can modify the `tidb-server` configuration parameter to adjust the threshold and then restart the `tidb-server`. + +#### How to control or change the execution priority of SQL commits? + +TiDB supports changing the priority on a [per-session](sql/tidb-specific.md#tidb_force_priority), [global](sql/server-command-option.md#force-priority) or [individual statement basis](sql/dml.md). Priority has the following meaning: + +- HIGH_PRIORITY: this statement has a high priority, that is, TiDB gives priority to this statement and executes it first. + +- LOW_PRIORITY: this statement has a low priority, that is, TiDB reduces the priority of this statement during the execution period. + +You can combine the above two parameters with the DML of TiDB to use them. For usage details, see [TiDB DML](sql/dml.md). For example: + +1. Adjust the priority by writing SQL statements in the database: + + ```sql + select HIGH_PRIORITY | LOW_PRIORITY count(*) from table_name; + insert HIGH_PRIORITY | LOW_PRIORITY into table_name insert_values; + delete HIGH_PRIORITY | LOW_PRIORITY from table_name; + update HIGH_PRIORITY | LOW_PRIORITY table_reference set assignment_list where where_condition; + replace HIGH_PRIORITY | LOW_PRIORITY into table_name; + ``` + +2. The full table scan statement automatically adjusts itself to a low priority. `analyze` has a low priority by default. + +#### What's the trigger strategy for `auto analyze` in TiDB? + +Trigger strategy: `auto analyze` is automatically triggered when the number of pieces of data in a new table reaches 1000 and this table has no write operation within one minute. + +When the modified number or the current total row number is larger than `tidb_auto_analyze_ratio`, the `analyze` statement is automatically triggered. The default value of `tidb_auto_analyze_ratio` is 0, indicating that this feature is disabled. To ensure safety, its minimum value is 0.3 when the feature is enabled, and it must be smaller than `pseudo-estimate-ratio` whose default value is 0.7, otherwise pseudo statistics will be used for a period of time. It is recommended to set `tidb_auto_analyze_ratio` to 0.5. + +#### How to use a specific index with hint in a SQL statement? + +Its usage is similar to MySQL: + +```sql +select column_name from table_name use index(index_name)where where_condition; +``` + ### Manage the TiKV server #### What is the recommended number of replicas in the TiKV cluster? Is it better to keep the minimum number for high availability? @@ -443,25 +591,24 @@ Currently, some files of TiKV master have a higher compression rate, which depen TiKV implements the Column Family (CF) feature of RocksDB. By default, the KV data is eventually stored in the 3 CFs (default, write and lock) within RocksDB. -- The default CF stores real data and the corresponding parameter is in [rocksdb.defaultcf]. The write CF stores the data version information (MVCC) and index-related data, and the corresponding parameter is in `[rocksdb.writecf]`. The lock CF stores the lock information and the system uses the default parameter. +- The default CF stores real data and the corresponding parameter is in `[rocksdb.defaultcf]`. The write CF stores the data version information (MVCC) and index-related data, and the corresponding parameter is in `[rocksdb.writecf]`. The lock CF stores the lock information and the system uses the default parameter. - The Raft RocksDB instance stores Raft logs. The default CF mainly stores Raft logs and the corresponding parameter is in `[raftdb.defaultcf]`. - Each CF has an individual block-cache to cache data blocks and improve RocksDB read speed. The size of block-cache is controlled by the `block-cache-size` parameter. A larger value of the parameter means more hot data can be cached and is more favorable to read operation. At the same time, it consumes more system memory. - Each CF has an individual write-buffer and the size is controlled by the `write-buffer-size` parameter. #### Why it occurs that "TiKV channel full"? -- The Raftstore thread is too slow. You can view the CPU usage status of Raftstore. -- TiKV is too busy (read, write, disk I/O, etc.) and cannot manage to handle it. +- The Raftstore thread is too slow or blocked by I/O. You can view the CPU usage status of Raftstore. +- TiKV is too busy (CPU, disk I/O, etc.) and cannot manage to handle it. #### Why does TiKV frequently switch Region leader? -- Network problem leads to the failure of communication between nodes. You can view the monitoring information of Report failures. -- The original main leader node fails, and cannot send information to the follower in time. -- The Raftstore thread fails. +- Leaders can not reach out to followers. E.g., network problem or node failure. +- Leader balance from PD. E.g., PD wants to transfer leaders from a hotspot node to others. -#### If the leader node is down, will the service be affected? How long? +#### If a node is down, will the service be affected? How long? -TiDB uses Raft to synchronize data among multiple replicas and guarantees the strong consistency of data. If one replica goes wrong, the other replicas can guarantee data security. The default number of replicas in each Region is 3. Based on the Raft protocol, a leader is elected in each Region, and if a single Region leader fails, a new Region leader is soon elected after a maximum of 2 * lease time (lease time is 10 seconds). +TiDB uses Raft to synchronize data among multiple replicas and guarantees the strong consistency of data. If one replica goes wrong, the other replicas can guarantee data security. The default number of replicas in each Region is 3. Based on the Raft protocol, a leader is elected in each Region, and if a single leader fails, a follower is soon elected as Region leader after a maximum of 2 * lease time (lease time is 10 seconds). #### What are the TiKV scenarios that take up high I/O, memory, CPU, and exceed the parameter configuration? @@ -469,7 +616,7 @@ Writing or reading a large volume of data in TiKV takes up high I/O, memory and #### Does TiKV support SAS/SATA disks or mixed deployment of SSD/SAS disks? -No. For OLTP scenarios, TiDB requires high I/O disks for data access and operation. As a distributed database with strong consistency, TiDB has some write amplification such as replica replication and bottom layer storage compaction. Therefore, it is recommended to use NVMe SSD as the storage disks in TiDB best practices. Besides, the mixed deployment of TiKV and PD is not supported. +No. For OLTP scenarios, TiDB requires high I/O disks for data access and operation. As a distributed database with strong consistency, TiDB has some write amplification such as replica replication and bottom layer storage compaction. Therefore, it is recommended to use NVMe SSD as the storage disks in TiDB best practices. Mixed deployment of TiKV and PD is not supported. #### Is the Range of the Key data table divided before data access? @@ -477,7 +624,7 @@ No. It differs from the table splitting rules of MySQL. In TiKV, the table Range #### How does Region split? -Region is not divided in advance, but it follows a Region split mechanism. When the Region size exceeds the value of the `region_split_size` parameter, split is triggered. After the split, the information is reported to PD. +Region is not divided in advance, but it follows a Region split mechanism. When the Region size exceeds the value of the `region_split_size` or `region-split-keys` parameters, split is triggered. After the split, the information is reported to PD. #### Does TiKV have the `innodb_flush_log_trx_commit` parameter like MySQL, to guarantee the security of data? @@ -514,6 +661,10 @@ TiKV supports calling the interface separately. Theoretically, you can take an i - Reduce the data transmission between TiDB and TiKV - Make full use of the distributed computing resources of TiKV to execute computing pushdown +#### The error message `IO error: No space left on device While appending to file` is displayed. + +This is because the disk space is not enough. You need to add nodes or enlarge the disk space. + ### TiDB test #### What is the performance test result for TiDB using Sysbench? @@ -533,13 +684,13 @@ At the beginning, many users tend to do a benchmark test or a comparison test be TiDB is designed for scenarios where sharding is used because the capacity of a MySQL standalone is limited, and where strong consistency and complete distributed transactions are required. One of the advantages of TiDB is pushing down computing to the storage nodes to execute concurrent computing. -TiDB is not suitable for tables of small size (such as below ten million level), because its strength in concurrency cannot be showed with small size data and limited Region. A typical example is the counter table, in which records of a few lines are updated high frequently. In TiDB, these lines become several Key-Value pairs in the storage engine, and then settle into a Region located on a single node. The overhead of background replication to guarantee strong consistency and operations from TiDB to TiKV leads to a poorer performance than a MySQL standalone. +TiDB is not suitable for tables of small size (such as below ten million level), because its strength in concurrency cannot be shown with a small size of data and limited Regions. A typical example is the counter table, in which records of a few lines are updated high frequently. In TiDB, these lines become several Key-Value pairs in the storage engine, and then settle into a Region located on a single node. The overhead of background replication to guarantee strong consistency and operations from TiDB to TiKV leads to a poorer performance than a MySQL standalone. ### Backup and restore #### How to back up data in TiDB? -Currently, the major way of backing up data in TiDB is using `mydumper`. For details, see [mydumper repository](https://github.com/maxbube/mydumper). Although the official MySQL tool `mysqldump` is also supported in TiDB to back up and restore data, its performance is poorer than `mydumper`/`loader` and it needs much more time to back up and restore large volumes of data. Therefore, it is not recommended to use `mysqldump`. +Currently, the preferred method for backup is using the [PingCAP fork of mydumper](tools/mydumper.md). Although the official MySQL tool `mysqldump` is also supported in TiDB to back up and restore data, its performance is poorer than [`mydumper`](tools/mydumper.md)/[`loader`](tools/loader.md) and it needs much more time to back up and restore large volumes of data. Keep the size of the data file exported from `mydumper` as small as possible. It is recommended to keep the size within 64M. You can set value of the `-F` parameter to 64. @@ -550,13 +701,11 @@ You can edit the `t` parameter of `loader` based on the number of TiKV instances ### Full data export and import #### Mydumper - -See the [mydumper repository](https://github.com/maxbube/mydumper). +See [mydumper Instructions](tools/mydumper.md). #### Loader - See [Loader Instructions](tools/loader.md). - + #### How to migrate an application running on MySQL to TiDB? Because TiDB supports most MySQL syntax, generally you can migrate your applications to TiDB without changing a single line of code in most cases. You can use [checker](https://github.com/pingcap/tidb-tools/tree/master/checker) to check whether the Schema in MySQL is compatible with TiDB. @@ -618,9 +767,30 @@ To migrate all the data or migrate incrementally from DB2 or Oracle to TiDB, see Currently, it is recommended to use OGG. -### Migrate the data incrementally +#### Error: `java.sql.BatchUpdateExecption:statement count 5001 exceeds the transaction limitation` while using Sqoop to write data into TiDB in batches + +In Sqoop, `--batch` means committing 100 `statement`s in each batch, but by default each `statement` contains 100 SQL statements. So, 100 * 100 = 10000 SQL statements, which exceeds 5000, the maximum number of statements allowed in a single TiDB transaction. + +Two solutions: + +- Add the `-Dsqoop.export.records.per.statement=10` option as follows: + + ``` + sqoop export \ + -Dsqoop.export.records.per.statement=10 \ + --connect jdbc:mysql://mysql.example.com/sqoop \ + --username sqoop ${user} \ + --password ${passwd} \ + --table ${tab_name} \ + --export-dir ${dir} \ + --batch + ``` + +- You can also increase the limited number of statements in a single TiDB transaction, but this will consume more memory. + +### Migrate the data online -#### Syncer +#### Syncer ##### Syncer user guide @@ -642,15 +812,28 @@ Restart Prometheus. No. Currently, the data synchronization depends on the application itself. -#### Wormhole +##### Does Syncer support synchronizing only some of the tables when Syncer is synchronizing data? -Wormhole is a data synchronization service, which enables the user to easily synchronize all the data or synchronize incrementally using Web console. It supports multiple types of data migration, such as from MySQL to TiDB, and from MongoDB to TiDB. +Yes. For details, see [Syncer User Guide](tools/syncer.md) + +##### Do frequent DDL operations affect the synchronization speed of Syncer? + +Frequent DDL operations may affect the synchronization speed. For Sycner, DDL operations are executed serially. When DDL operations are executed during data synchronization, data will be synchronized serially and thus the synchronization speed will be slowed down. + +##### If the machine that Syncer is in is broken and the directory of the `syncer.meta` file is lost, what should I do? + +When you synchronize data using Syncer GTID, the `syncer.meta` file is constantly updated during the synchronization process. The current version of Syncer does not contain the design for high availability. The `syncer.meta` configuration file of Syncer is directly stored on the hard disks, which is similar to other tools in the MySQL ecosystem, such as mydumper. + +Two solutions: + +- Put the `syncer.meta` file in a relatively secure disk. For example, use disks with RAID 1. +- Restore the location information of history synchronization according to the monitoring data that Syncer reports to Prometheus regularly. But the location information might be inaccurate due to the delay when a large amount of data is synchronized. ### Migrate the traffic #### How to migrate the traffic quickly? -It is recommended to build a multi-source MySQL, MongoDB -> TiDB real-time synchronization environment using Syncer or Wormhole. You can migrate the read and write traffic in batches by editing the network configuration as needed. Deploy a stable network LB (HAproxy, LVS, F5, DNS, etc.) on the upper layer, in order to implement seamless migration by directly editing the network configuration. +It is recommended to build a multi-source MySQL -> TiDB real-time synchronization environment using Syncer tool. You can migrate the read and write traffic in batches by editing the network configuration as needed. Deploy a stable network LB (HAproxy, LVS, F5, DNS, etc.) on the upper layer, in order to implement seamless migration by directly editing the network configuration. #### Is there a limit for the total write and read capacity in TiDB? @@ -676,7 +859,7 @@ There are [similar limits](https://cloud.google.com/spanner/docs/limits) on Goog #### Does TiDB release space immediately after deleting data? -`DELETE`, `TRUNCATE` and `DROP` do not release space immediately. For `TRUNCATE` and `DROP` operations, TiDB deletes the data and releases the space after reaching the GC (garbage collection) time (10 minutes by default). For the `DELETE` operation, TiDB deletes the data and does not release the space based on the GC mechanism, but reuses the space when subsequent data is committed to RocksDB and compacted. +None of the `DELETE`, `TRUNCATE` and `DROP` operations release data immediately. For the `TRUNCATE` and `DROP` operations, after the TiDB GC (Garbage Collection) time (10 minutes by default), the data is deleted and the space is released. For the `DELETE` operation, the data is deleted but the space is not released according to TiDB GC. When subsequent data is written into RocksDB and executes `COMPACT`, the space is reused. #### Can I execute DDL operations on the target table when loading data? @@ -686,10 +869,6 @@ No. None of the DDL operations can be executed on the target table when you load Yes. But the `load data` does not support the `replace into` syntax. -#### How long does it take to reclaim disk space after deleting data? - -None of the `Delete`, `Truncate` and `Drop` operations releases data immediately. For the `Truncate` and `Drop` operations, after the TiDB GC (Garbage Collection) time (10 minutes by default), the data is deleted and the space is released. For the `Delete` operation, the data is deleted but the space is not released according to TiDB GC. When data is written into RocksDB and executes `Compact`, the space is reused. - #### Why does the query speed getting slow after deleting data? Deleting a large amount of data leaves a lot of useless keys, affecting the query efficiency. Currently the Region Merge feature is in development, which is expected to solve this problem. For details, see the [deleting data section in TiDB Best Practices](https://pingcap.com/blog/2017-07-24-tidbbestpractice/#write). @@ -746,12 +925,24 @@ Use `admin show ddl` to view the current job of adding an index. #### Does TiDB support CBO (Cost-Based Optimization)? If yes, to what extent? -Yes. TiDB uses the cost-based optimizer. The cost model and statistics are constantly optimized. Besides, TiDB also supports correlation algorithms like hash join and soft merge. +Yes. TiDB uses the cost-based optimizer. The cost model and statistics are constantly optimized. TiDB also supports correlation algorithms like hash join and soft merge. #### How to determine whether I need to execute `analyze` on a table? View the `Healthy` field using `show stats_healthy` and generally you need to execute `analyze` on a table when the field value is smaller than 60. +#### What is the ID rule when a query plan is presented as a tree? What is the execution order for this tree? + +No rule exists for these IDs but the IDs are unique. When IDs are generated, a counter works and adds one when one plan is generated. The execution order has nothing to do with the ID. The whole query plan is a tree and the execution process starts from the root node and the data is returned to the upper level continuously. For details about the query plan, see [Understanding the TiDB Query Execution Plan](sql/understanding-the-query-execution-plan.md). + +#### In the TiDB query plan, `cop` tasks are in the same root. Are they executed concurrently? + +Currently the computing tasks of TiDB belong to two different types of tasks: `cop task` and `root task`. + +`cop task` is the computing task which is pushed down to the KV end for distributed execution; `root task` is the computing task for single point execution on the TiDB end. + +Generally the input data of `root task` comes from `cop task`; when `root task` processes data, `cop task` of TiKV can processes data at the same time and waits for the pull of `root task` of TiDB. Therefore, `cop` tasks can be considered as executed concurrently; but their data has an upstream and downstream relationship. During the execution process, they are executed concurrently during some time. For example, the first `cop task` is processing the data in [100, 200] and the second `cop task` is processing the data in [1, 100]. For details, see [Understanding the TiDB Query Plan](sql/understanding-the-query-execution-plan.md). + ## Database optimization ### TiDB @@ -760,6 +951,10 @@ View the `Healthy` field using `show stats_healthy` and generally you need to ex See [The TiDB Command Options](sql/server-command-option.md). +#### How to scatter the hotspots? + +In TiDB, data is divided into Regions for management. Generally, the TiDB hotspot means the Read/Write hotspot in a Region. In TiDB, for the table whose primary key (PK) is not an integer or which has no PK, you can properly break Regions by configuring `SHARD_ROW_ID_BITS` to scatter the Region hotspots. For details, see the introduction of `SHARD_ROW_ID_BITS` in [TiDB Specific System Variables and Syntax](sql/tidb-specific.md). + ### TiKV #### Tune TiKV performance @@ -784,6 +979,20 @@ The monitoring system of TiDB consists of Prometheus and Grafana. From the dashb Yes. Find the startup script on the machine where Prometheus is started, edit the startup parameter and restart Prometheus. +#### Region Health monitor + +In TiDB 2.0, Region health is monitored in the PD metric monitoring page, in which the `Region Health` monitoring item shows the statistics of all the Region replica status. `miss` means shortage of replicas and `extra` means the extra replica exists. In addition, `Region Health` also shows the isolation level by `label`. `level-1` means the Region replicas are isolated physically in the first `label` level. All the Regions are in `level-0` when `location label` is not configured. + +#### What is the meaning of `selectsimplefull` in Statement Count monitor? + +It means full table scan but the table might be a small system table. + +#### What is the difference between `QPS` and `Statement OPS` in the monitor? + +The `QPS` statisctics is about all the SQL statements, including `use database`, `load data`, `begin`, `commit`, `set`, `show`, `insert` and `select`. + +The `Statement OPS` statistics is only about applications related SQL statements, including `select`, `update` and `insert`, therefore the `Statement OPS` statistics matches the applications better. + ## Troubleshoot ### TiDB custom error messages @@ -808,7 +1017,7 @@ A lock resolving timeout. This usually occurs when a large number of transaction The accessed Region is not available. A Raft Group is not available, with possible reasons like an inadequate number of replicas. This usually occurs when the TiKV server is busy or the TiKV node is shut down. Check the status, monitoring data and log of the TiKV server. -#### ERROR 9006 (HY000): GC Too Early +#### ERROR 9006 (HY000): GC life time is shorter than transaction duration The interval of `GC Life Time` is too short. The data that should have been read by long transactions might be deleted. You can add `GC Life Time` using the following command: @@ -829,3 +1038,13 @@ update mysql.tidb set variable_value='30m' where variable_name='tikv_gc_life_tim #### ERROR 1105 (HY000): other error: unknown error Wire Error(InvalidEnumValue(4004)) This error usually occurs when the version of TiDB does not match with the version of TiKV. To avoid version mismatch, upgrade all components when you upgrade the version. + +#### ERROR 1148 (42000): the used command is not allowed with this TiDB version + +When you execute the `LOAD DATA LOCAL` statement but the MySQL client does not allow executing this statement (the value of the `local_infile` option is 0), this error occurs. + +The solution is to use the `--local-infile=1` option when you start the MySQL client. For example, use command like `mysql --local-infile=1 -u root -h 127.0.0.1 -P 4000`. The default value of `local-infile` is different in different versions of MySQL client, therefore you need to configure it in some MySQL clients and do not need to configure it in some others. + +#### ERROR 9001 (HY000): PD server timeout start timestamp may fall behind safe point + +This error occurs when TiDB fails to access PD. A worker in the TiDB background continuously queries the safepoint from PD and this error occurs if it fails to query within 100s. Generally, it is because the disk on PD is slow and busy or the network failed between TiDB and PD. For the details of common errors, see [Error Number and Fault Diagnosis](sql/error.md). diff --git a/QUICKSTART.md b/QUICKSTART.md index c5155d7c2b131..af6208b61aaec 100644 --- a/QUICKSTART.md +++ b/QUICKSTART.md @@ -1,727 +1,34 @@ --- title: TiDB Quick Start Guide +summary: Learn how to quickly start a TiDB cluster. category: quick start --- # TiDB Quick Start Guide -## About TiDB +As an open source distributed scalable HTAP database, TiDB can be deployed on-premise or in-cloud. The following deployment options are officially supported by PingCAP. -TiDB (The pronunciation is: /'taɪdiːbi:/ tai-D-B, etymology: titanium) is an open source distributed scalable Hybrid Transactional and Analytical Processing (HTAP) database built by PingCAP. Inspired by the design of Google F1 and Google Spanner, TiDB features infinite horizontal scalability, strong consistency, and high availability. The goal of TiDB is to serve as a one-stop solution for both OLTP (Online Transactional Processing) and OLAP (Online Analytical Processing). +- [Ansible Deployment](op-guide/ansible-deployment.md): This guide describes how to deploy TiDB using Ansible. It is strongly recommended for production deployment. +- [Ansible Offline Deployment](op-guide/offline-ansible-deployment.md): If your environment has no access to the internet, you can follow this guide to see how to deploy a TiDB cluster offline using Ansible. +- [Docker Deployment](op-guide/docker-deployment.md): This guide describes how to deploy TiDB using Docker. +- [Docker Compose Deployment](op-guide/docker-compose.md): This guide describes how to deploy TiDB using Docker compose. You can follow this guide to quickly deploy a TiDB cluster for testing and development on your local drive. +- [Kubernetes Deployment (beta)](op-guide/kubernetes.md): This guide describes how to deploy TiDB on Kubernetes using [TiDB Operator](https://github.com/pingcap/tidb-operator). You can follow this guide to see how to deploy TiDB on Google Kubernetes Engine or deploy TiDB locally using Docker in Docker. -## About this guide +## Community Provided Blog Posts & Tutorials -This guide outlines how to perform a quick deployment of a TiDB cluster using TiDB-Ansible and walks you through the basic TiDB operations and administrations. +The following list collects deployment guides and tutorials from the community. The content is subject to change by the contributors. -## Deploy the TiDB cluster +- [How To Spin Up an HTAP Database in 5 Minutes with TiDB + TiSpark](https://www.pingcap.com/blog/how_to_spin_up_an_htap_database_in_5_minutes_with_tidb_tispark/) +- [Developer install guide (single machine)](http://www.tocker.ca/this-blog-now-powered-by-wordpress-tidb.html) -This section describes how to deploy a TiDB cluster. A TiDB cluster consists of different components: TiDB servers, TiKV servers, and Placement Driver (PD) servers. +_Your contribution is also welcome! Feel free to open a [pull request](https://github.com/pingcap/docs/edit/master/QUICKSTART.md) to add additional links._ -The architecture is as follows: +## Source Code -![TiDB Architecture](media/tidb-architecture.png) +Source code for [all components of the TiDB platform](https://github.com/pingcap) is available on GitHub. -For details of deploying a TiDB cluster, see [Ansible Deployment](op-guide/ansible-deployment.md). - -## Try TiDB - -This section describes some basic CRUD operations in TiDB. - -### Create, show, and drop a database - -- To create a database, use the `CREATE DATABASE` statement. The Syntax is as follows: - - ```sql - CREATE DATABASE db_name [options]; - ``` - - For example, the following statement creates a database with the name `samp_db`: - - ```sql - CREATE DATABASE IF NOT EXISTS samp_db; - ``` - -- To show the databases, use the `SHOW DATABASES` statement: - - ```sql - SHOW DATABASES; - ``` - -- To delete a database, use the `DROP DATABASE` statement. For example: - - ```sql - DROP DATABASE samp_db; - ``` - -### Create, show, and drop a table - -- To create a table, use the `CREATE TABLE` statement. The Syntax is as follows: - - ```sql - CREATE TABLE table_name column_name data_type constraint; - ``` - - For example: - - ```sql - CREATE TABLE person ( - number INT(11), - name VARCHAR(255), - birthday DATE - ); - ``` - - Add `IF NOT EXISTS` to prevent an error if the table exists: - - ```sql - CREATE TABLE IF NOT EXISTS person ( - number INT(11), - name VARCHAR(255), - birthday DATE - ); - ``` - -- To view the statement that creates the table, use the `SHOW CREATE` statement. For example: - - ```sql - SHOW CREATE table person; - ``` - -- To show all the tables in a database, use the `SHOW TABLES` statement. For example: - - ```sql - SHOW TABLES FROM samp_db; - ``` - -- To show the information about all the columns in a table, use the `SHOW FULL COLUMNS` statement. For example: - - ```sql - SHOW FULL COLUMNS FROM person; - ``` - -- To delete a table, use the `DROP TABLE` statement. For example: - - ```sql - DROP TABLE person; - ``` - - or - - ```sql - DROP TABLE IF EXISTS person; - ``` - -### Create, show, and drop an index - -- To create an index for the column whose value is not unique, use the `CREATE INDEX` or `ALTER TABLE` statement. For example: - - ```sql - CREATE INDEX person_num ON person (number); - ``` - - or - - ```sql - ALTER TABLE person ADD INDEX person_num (number); - ``` - -- To create a unique index for the column whose value is unique, use the `CREATE UNIQUE INDEX` or `ALTER TABLE` statement. For example: - - ```sql - CREATE UNIQUE INDEX person_num ON person (number); - ``` - - or - - ```sql - ALTER TABLE person ADD UNIQUE person_num on (number); - ``` - -- To show all the indexes in a table, use the `SHOW INDEX` statement: - - ```sql - SHOW INDEX from person; - ``` - -- To delete an index, use the `DROP INDEX` or `ALTER TABLE` statement. For example: - - ```sql - DROP INDEX person_num ON person; - ALTER TABLE person DROP INDEX person_num; - ``` - -### Insert, select, update, and delete data - -- To insert data into a table, use the `INSERT` statement. For example: - - ```sql - INSERT INTO person VALUES("1","tom","20170912"); - ``` - -- To view the data in a table, use the `SELECT` statement. For example: - - ```sql - SELECT * FROM person; - +--------+------+------------+ - | number | name | birthday | - +--------+------+------------+ - | 1 | tom | 2017-09-12 | - +--------+------+------------+ - ``` - -- To update the data in a table, use the `UPDATE` statement. For example: - - ```sql - UPDATE person SET birthday='20171010' WHERE name='tom'; - - SELECT * FROM person; - +--------+------+------------+ - | number | name | birthday | - +--------+------+------------+ - | 1 | tom | 2017-10-10 | - +--------+------+------------+ - ``` - -- To delete the data in a table, use the `DELETE` statement. For example: - - ```sql - DELETE FROM person WHERE number=1; - SELECT * FROM person; - Empty set (0.00 sec) - ``` - -### Create, authorize, and delete a user - -- To create a user, use the `CREATE USER` statement. The following example creates a user named `tiuser` with the password `123456`: - - ```sql - CREATE USER 'tiuser'@'localhost' IDENTIFIED BY '123456'; - ``` - -- To grant `tiuser` the privilege to retrieve the tables in the `samp_db` database: - - ```sql - GRANT SELECT ON samp_db.* TO 'tiuser'@'localhost'; - ``` - -- To check the privileges of `tiuser`: - - ```sql - SHOW GRANTS for tiuser@localhost; - ``` - -- To delete `tiuser`: - - ```sql - DROP USER 'tiuser'@'localhost'; - ``` - -## Monitor the TiDB cluster - -Open a browser to access the monitoring platform: `http://172.16.10.3:3000`. - -The default account and password are: `admin`/`admin`. - -### About the key metrics - -Service | Panel Name | Description | Normal Range ----- | ---------------- | ---------------------------------- | -------------- -PD | Storage Capacity | the total storage capacity of the TiDB cluster | -PD | Current Storage Size | the occupied storage capacity of the TiDB cluster | -PD | Store Status -- up store | the number of TiKV nodes that are up | -PD | Store Status -- down store | the number of TiKV nodes that are down | `0`. If the number is bigger than `0`, it means some node(s) are not down. -PD | Store Status -- offline store | the number of TiKV nodes that are manually offline| -PD | Store Status -- Tombstone store | the number of TiKV nodes that are Tombstone| -PD | Current storage usage | the storage occupancy rate of the TiKV cluster | If it exceeds 80%, you need to consider adding more TiKV nodes. -PD | 99% completed cmds duration seconds | the 99th percentile duration to complete a pd-server request| less than 5ms -PD | average completed cmds duration seconds | the average duration to complete a pd-server request | less than 50ms -PD | leader balance ratio | the leader ratio difference of the nodes with the biggest leader ratio and the smallest leader ratio | It is less than 5% for a balanced situation. It becomes bigger when a node is restarting. -PD | region balance ratio | the region ratio difference of the nodes with the biggest region ratio and the smallest region ratio | It is less than 5% for a balanced situation. It becomes bigger when adding or removing a node. -TiDB | handle requests duration seconds | the response time to get TSO from PD| less than 100ms -TiDB | tidb server QPS | the QPS of the cluster | application specific -TiDB | connection count | the number of connections from application servers to the database | Application specific. If the number of connections hops, you need to find out the reasons. If it drops to 0, you can check if the network is broken; if it surges, you need to check the application. -TiDB | statement count | the number of different types of statement within a given time | application specific -TiDB | Query Duration 99th percentile | the 99th percentile query time | -TiKV | 99% & 99.99% scheduler command duration | the 99th percentile and 99.99th percentile scheduler command duration| For 99%, it is less than 50ms; for 99.99%, it is less than 100ms. -TiKV | 95% & 99.99% storage async_request duration | the 95th percentile and 99.99th percentile Raft command duration | For 95%, it is less than 50ms; for 99.99%, it is less than 100ms. -TiKV | server report failure message | There might be an issue with the network or the message might not come from this cluster. | If there are large amount of messages which contains `unreachable`, there might be an issue with the network. If the message contains `store not match`, the message does not come from this cluster. -TiKV | Vote |the frequency of the Raft vote | Usually, the value only changes when there is a split. If the value of Vote remains high for a long time, the system might have a severe issue and some nodes are not working. -TiKV | 95% and 99% coprocessor request duration | the 95th percentile and the 99th percentile coprocessor request duration | Application specific. Usually, the value does not remain high. -TiKV | Pending task | the number of pending tasks | Except for PD worker, it is not normal if the value is too high. -TiKV | stall | RocksDB stall time | If the value is bigger than 0, it means that RocksDB is too busy, and you need to pay attention to IO and CPU usage. -TiKV | channel full | The channel is full and the threads are too busy. | If the value is bigger than 0, the threads are too busy. -TiKV | 95% send message duration seconds | the 95th percentile message sending time | less than 50ms -TiKV | leader/region | the number of leader/region per TiKV server| application specific - -## Scale the TiDB cluster - -The capacity of a TiDB cluster can be increased or decreased without affecting the online services. - -> **Warning:** In decreasing the capacity, if your cluster has a mixed deployment of other services, do not perform the following procedures. The following examples assume that the removed nodes have no mixed deployment of other services. - -Assume that the topology is as follows: - -| Name | Host IP | Services | -| ---- | ------- | -------- | -| node1 | 172.16.10.1 | PD1 | -| node2 | 172.16.10.2 | PD2 | -| node3 | 172.16.10.3 | PD3, Monitor | -| node4 | 172.16.10.4 | TiDB1 | -| node5 | 172.16.10.5 | TiDB2 | -| node6 | 172.16.10.6 | TiKV1 | -| node7 | 172.16.10.7 | TiKV2 | -| node8 | 172.16.10.8 | TiKV3 | -| node9 | 172.16.10.9 | TiKV4 | - -### Increase the capacity of a TiDB/TiKV node - -For example, if you want to add two TiDB nodes (node101, node102) with the IP address `172.16.10.101` and `172.16.10.102`, you can use the following procedures: - -1. Edit the `inventory.ini` file and append the node information: - - ```ini - [tidb_servers] - 172.16.10.4 - 172.16.10.5 - 172.16.10.101 - 172.16.10.102 - - [pd_servers] - 172.16.10.1 - 172.16.10.2 - 172.16.10.3 - - [tikv_servers] - 172.16.10.6 - 172.16.10.7 - 172.16.10.8 - 172.16.10.9 - - [monitored_servers] - 172.16.10.1 - 172.16.10.2 - 172.16.10.3 - 172.16.10.4 - 172.16.10.5 - 172.16.10.6 - 172.16.10.7 - 172.16.10.8 - 172.16.10.9 - 172.16.10.101 - 172.16.10.102 - - [monitoring_servers] - 172.16.10.3 - - [grafana_servers] - 172.16.10.3 - ``` - - Now the topology is as follows: - - | Name | Host IP | Services | - | ---- | ------- | -------- | - | node1 | 172.16.10.1 | PD1 | - | node2 | 172.16.10.2 | PD2 | - | node3 | 172.16.10.3 | PD3, Monitor | - | node4 | 172.16.10.4 | TiDB1 | - | node5 | 172.16.10.5 | TiDB2 | - | **node101** | **172.16.10.101**|**TiDB3** | - | **node102** | **172.16.10.102**|**TiDB4** | - | node6 | 172.16.10.6 | TiKV1 | - | node7 | 172.16.10.7 | TiKV2 | - | node8 | 172.16.10.8 | TiKV3 | - | node9 | 172.16.10.9 | TiKV4 | - -2. Initialize the newly added node: - - ``` - ansible-playbook bootstrap.yml -l 172.16.10.101,172.16.10.102 - ``` - -3. Deploy the newly added node: - - ``` - ansible-playbook deploy.yml -l 172.16.10.101,172.16.10.102 - ``` - -4. Start the newly added node: - - ``` - ansible-playbook start.yml -l 172.16.10.101,172.16.10.102 - ``` - -5. Update the Prometheus configuration and restart the cluster: - - ``` - ansible-playbook rolling_update_monitor.yml --tags=prometheus - ``` - -6. Monitor the status of the entire cluster and the newly added node by opening a browser to access the monitoring platform: `http://172.16.10.3:3000`. - -You can use the same procedure to add a TiKV node. But to add a PD node, some configuration files need to be manually updated. - -### Increase the capacity of a PD node - -For example, if you want to add a PD node (node103) with the IP address `172.16.10.103`, you can use the following procedures: - -1. Edit the `inventory.ini` file and append the node information: - - ```ini - [tidb_servers] - 172.16.10.4 - 172.16.10.5 - - [pd_servers] - 172.16.10.1 - 172.16.10.2 - 172.16.10.3 - 172.16.10.103 - - [tikv_servers] - 172.16.10.6 - 172.16.10.7 - 172.16.10.8 - 172.16.10.9 - - [monitored_servers] - 172.16.10.4 - 172.16.10.5 - 172.16.10.1 - 172.16.10.2 - 172.16.10.3 - 172.16.10.103 - 172.16.10.6 - 172.16.10.7 - 172.16.10.8 - 172.16.10.9 - - [monitoring_servers] - 172.16.10.3 - - [grafana_servers] - 172.16.10.3 - ``` - - Now the topology is as follows: - - | Name | Host IP | Services | - | ---- | ------- | -------- | - | node1 | 172.16.10.1 | PD1 | - | node2 | 172.16.10.2 | PD2 | - | node3 | 172.16.10.3 | PD3, Monitor | - | **node103** | **172.16.10.103** | **PD4** | - | node4 | 172.16.10.4 | TiDB1 | - | node5 | 172.16.10.5 | TiDB2 | - | node6 | 172.16.10.6 | TiKV1 | - | node7 | 172.16.10.7 | TiKV2 | - | node8 | 172.16.10.8 | TiKV3 | - | node9 | 172.16.10.9 | TiKV4 | - -2. Initialize the newly added node: - - ``` - ansible-playbook bootstrap.yml -l 172.16.10.103 - ``` - -3. Deploy the newly added node: - - ``` - ansible-playbook deploy.yml -l 172.16.10.103 - ``` - -4. Login the newly added PD node and edit the starting script: - - ``` - {deploy_dir}/scripts/run_pd.sh - ``` - - 1. Remove the `--initial-cluster="xxxx" \` configuration. - 2. Add `--join="http://172.16.10.1:2379" \`. The IP address (`172.16.10.1`) can be any of the existing PD IP address in the cluster. - 3. Manually start the PD service in the newly added PD node: - - ``` - {deploy_dir}/scripts/start_pd.sh - ``` - - 4. Use `pd-ctl` to check whether the new node is added successfully: - - ``` - ./pd-ctl -u "http://172.16.10.1:2379" - ``` - - > **Note:** `pd-ctl` is a command used to check the number of PD nodes. - -5. Roll upgrade the entire cluster: - - ``` - ansible-playbook rolling_update.yml - ``` - -6. Update the Prometheus configuration and restart the cluster: - - ``` - ansible-playbook rolling_update_monitor.yml --tags=prometheus - ``` - -7. Monitor the status of the entire cluster and the newly added node by opening a browser to access the monitoring platform: `http://172.16.10.3:3000`. - -### Decrease the capacity of a TiDB node - -For example, if you want to remove a TiDB node (node5) with the IP address `172.16.10.5`, you can use the following procedures: - -1. Stop all services on node5: - - ``` - ansible-playbook stop.yml -l 172.16.10.5 - ``` - -2. Edit the `inventory.ini` file and remove the node information: - - ```ini - [tidb_servers] - 172.16.10.4 - #172.16.10.5 # the removed node - - [pd_servers] - 172.16.10.1 - 172.16.10.2 - 172.16.10.3 - - [tikv_servers] - 172.16.10.6 - 172.16.10.7 - 172.16.10.8 - 172.16.10.9 - - [monitored_servers] - 172.16.10.4 - #172.16.10.5 # the removed node - 172.16.10.1 - 172.16.10.2 - 172.16.10.3 - 172.16.10.6 - 172.16.10.7 - 172.16.10.8 - 172.16.10.9 - - [monitoring_servers] - 172.16.10.3 - - [grafana_servers] - 172.16.10.3 - ``` - - Now the topology is as follows: - - | Name | Host IP | Services | - | ---- | ------- | -------- | - | node1 | 172.16.10.1 | PD1 | - | node2 | 172.16.10.2 | PD2 | - | node3 | 172.16.10.3 | PD3, Monitor | - | node4 | 172.16.10.4 | TiDB1 | - | **node5** | **172.16.10.5** | **TiDB2 removed** | - | node6 | 172.16.10.6 | TiKV1 | - | node7 | 172.16.10.7 | TiKV2 | - | node8 | 172.16.10.8 | TiKV3 | - | node9 | 172.16.10.9 | TiKV4 | - -3. Update the Prometheus configuration and restart the cluster: - - ``` - ansible-playbook rolling_update_monitor.yml --tags=prometheus - ``` - -4. Monitor the status of the entire cluster by opening a browser to access the monitoring platform: `http://172.16.10.3:3000`. - -### Decrease the capacity of a TiKV node - -For example, if you want to remove a TiKV node (node9) with the IP address `172.16.10.9`, you can use the following procedures: - -1. Remove the node from the cluster using `pd-ctl`: - - 1. View the store id of node9: - - ``` - ./pd-ctl -u "http://172.16.10.1:2379" -d store - ``` - - 2. Remove node9 from the cluster, assuming that the store id is 10: - - ``` - ./pd-ctl -u "http://172.16.10.1:2379" -d store delete 10 - ``` - -2. Use Grafana or `pd-ctl` to check whether the node is successfully removed: - - ``` - ./pd-ctl -u "http://172.16.10.1:2379" -d store 10 - ``` - - > **Note:** It takes some time to remove the node. If node9 does not show in the result, the node is successfully removed. - -3. After the node is successfully removed, stop the services on node9: - - ``` - ansible-playbook stop.yml -l 172.16.10.9 - ``` - -4. Edit the `inventory.ini` file and remove the node information: - - ```ini - [tidb_servers] - 172.16.10.4 - 172.16.10.5 - - [pd_servers] - 172.16.10.1 - 172.16.10.2 - 172.16.10.3 - - [tikv_servers] - 172.16.10.6 - 172.16.10.7 - 172.16.10.8 - #172.16.10.9 # the removed node - - [monitored_servers] - 172.16.10.4 - 172.16.10.5 - 172.16.10.1 - 172.16.10.2 - 172.16.10.3 - 172.16.10.6 - 172.16.10.7 - 172.16.10.8 - #172.16.10.9 # the removed node - - [monitoring_servers] - 172.16.10.3 - - [grafana_servers] - 172.16.10.3 - ``` - - Now the topology is as follows: - - | Name | Host IP | Services | - | ---- | ------- | -------- | - | node1 | 172.16.10.1 | PD1 | - | node2 | 172.16.10.2 | PD2 | - | node3 | 172.16.10.3 | PD3, Monitor | - | node4 | 172.16.10.4 | TiDB1 | - | node5 | 172.16.10.5 | TiDB2 | - | node6 | 172.16.10.6 | TiKV1 | - | node7 | 172.16.10.7 | TiKV2 | - | node8 | 172.16.10.8 | TiKV3 | - | **node9** | **172.16.10.9** | **TiKV4 removed** | - -5. Update the Prometheus configuration and restart the cluster: - - ``` - ansible-playbook rolling_update_monitor.yml --tags=prometheus - ``` - -6. Monitor the status of the entire cluster by opening a browser to access the monitoring platform: `http://172.16.10.3:3000`. - -### Decrease the capacity of a PD node - -For example, if you want to remove a PD node (node2) with the IP address `172.16.10.2`, you can use the following procedures: - -1. Remove the node from the cluster using `pd-ctl`: - - 1. View the name of node2: - - ``` - ./pd-ctl -u "http://172.16.10.1:2379" -d member - ``` - - 2. Remove node2 from the cluster, assuming that the name is pd2: - - ``` - ./pd-ctl -u "http://172.16.10.1:2379" -d member delete name pd2 - ``` - -2. Use Grafana or `pd-ctl` to check whether the node is successfully removed: - - ``` - ./pd-ctl -u "http://172.16.10.1:2379" -d member - ``` - -3. After the node is successfully removed, stop the services on node2: - - ``` - ansible-playbook stop.yml -l 172.16.10.2 - ``` - -4. Edit the `inventory.ini` file and remove the node information: - - ```ini - [tidb_servers] - 172.16.10.4 - 172.16.10.5 - - [pd_servers] - 172.16.10.1 - #172.16.10.2 # the removed node - 172.16.10.3 - - [tikv_servers] - 172.16.10.6 - 172.16.10.7 - 172.16.10.8 - 172.16.10.9 - - [monitored_servers] - 172.16.10.4 - 172.16.10.5 - 172.16.10.1 - #172.16.10.2 # the removed node - 172.16.10.3 - 172.16.10.6 - 172.16.10.7 - 172.16.10.8 - 172.16.10.9 - - [monitoring_servers] - 172.16.10.3 - - [grafana_servers] - 172.16.10.3 - ``` - - Now the topology is as follows: - - | Name | Host IP | Services | - | ---- | ------- | -------- | - | node1 | 172.16.10.1 | PD1 | - | **node2** | **172.16.10.2** | **PD2 removed** | - | node3 | 172.16.10.3 | PD3, Monitor | - | node4 | 172.16.10.4 | TiDB1 | - | node5 | 172.16.10.5 | TiDB2 | - | node6 | 172.16.10.6 | TiKV1 | - | node7 | 172.16.10.7 | TiKV2 | - | node8 | 172.16.10.8 | TiKV3 | - | node9 | 172.16.10.9 | TiKV4 | - -5. Update the Prometheus configuration and restart the cluster: - - ``` - ansible-playbook rolling_update_monitor.yml --tags=prometheus - ``` - -6. Monitor the status of the entire cluster by opening a browser to access the monitoring platform: `http://172.16.10.3:3000`. - -## Destroy the TiDB cluster - -Stop the cluster: - -``` -ansible-playbook stop.yml -``` - -Destroy the cluster: - -``` -ansible-playbook unsafe_cleanup.yml -``` +- [TiDB](https://github.com/pingcap/tidb) +- [TiKV](https://github.com/tikv/tikv) +- [PD](https://github.com/pingcap/pd) +- [TiSpark](https://github.com/pingcap/tispark) +- [TiDB Operator](https://github.com/pingcap/tidb-operator) diff --git a/README.md b/README.md index 8933543da048a..865fcb269d74d 100644 --- a/README.md +++ b/README.md @@ -3,9 +3,12 @@ ## Documentation List + About TiDB - - [TiDB Introduction](overview.md#tidb-introduction) - - [TiDB Architecture](overview.md#tidb-architecture) -- [TiDB Quick Start Guide](QUICKSTART.md) + - [TiDB Introduction](overview.md) + - [TiDB Architecture](architecture.md) ++ Quick Start + - [TiDB Quick Start Guide](QUICKSTART.md) + - [Basic SQL Statements](try-tidb.md) + - [Bikeshare Example Database](bikeshare-example-database.md) + TiDB User Guide + TiDB Server Administration - [The TiDB Server](sql/tidb-server.md) @@ -13,12 +16,13 @@ - [The TiDB Data Directory](sql/tidb-server.md#tidb-data-directory) - [The TiDB System Database](sql/system-database.md) - [The TiDB System Variables](sql/variable.md) - - [The Proprietary System Variables and Syntax in TiDB](sql/tidb-specific.md) + - [The TiDB Specific System Variables](sql/tidb-specific.md) - [The TiDB Server Logs](sql/tidb-server.md#tidb-server-logs) - [The TiDB Access Privilege System](sql/privilege.md) - [TiDB User Account Management](sql/user-account-management.md) - [Use Encrypted Connections](sql/encrypted-connections.md) - + SQL Optimization + + SQL Optimization and Execution + - [SQL Optimization Process](sql/sql-optimizer-overview.md) - [Understand the Query Execution Plan](sql/understanding-the-query-execution-plan.md) - [Introduction to Statistics](sql/statistics.md) + Language Structure @@ -31,7 +35,7 @@ + Globalization - [Character Set Support](sql/character-set-support.md) - [Character Set Configuration](sql/character-set-configuration.md) - - [Time Zone](sql/time-zone.md) + - [Time Zone Support](sql/time-zone.md) + Data Types - [Numeric Types](sql/datatype.md#numeric-types) - [Date and Time Types](sql/datatype.md#date-and-time-types) @@ -64,14 +68,16 @@ - [Prepared SQL Statement Syntax](sql/prepare.md) - [Utility Statements](sql/util.md) - [TiDB SQL Syntax Diagram](https://pingcap.github.io/sqlgram/) - - [JSON Functions and Generated Column](sql/json-functions-generated-column.md) + - [Generated Columns](sql/generated-columns.md) - [Connectors and APIs](sql/connection-and-APIs.md) - [TiDB Transaction Isolation Levels](sql/transaction-isolation.md) - [Error Codes and Troubleshooting](sql/error.md) - [Compatibility with MySQL](sql/mysql-compatibility.md) - [TiDB Memory Control](sql/tidb-memory-control.md) + - [Slow Query Log](sql/slow-query.md) + Advanced Usage - [Read Data From History Versions](op-guide/history-read.md) + - [Garbage Collection (GC)](op-guide/gc.md) + TiDB Operations Guide - [Hardware and Software Requirements](op-guide/recommendation.md) + Deploy @@ -80,9 +86,11 @@ - [Docker Deployment](op-guide/docker-deployment.md) - [Docker Compose Deployment](op-guide/docker-compose.md) - [Cross-Region Deployment](op-guide/location-awareness.md) + - [Kubernetes Deployment](op-guide/kubernetes.md) + Configure - [Configuration Flags](op-guide/configuration.md) - [Configuration File Description](op-guide/tidb-config-file.md) + - [Modify Component Configuration Using Ansible](op-guide/ansible-deployment-rolling-update.md#modify-component-configuration) - [Enable TLS Authentication](op-guide/security.md) - [Generate Self-signed Certificates](op-guide/generate-self-signed-certificates.md) + Monitor @@ -91,8 +99,10 @@ - [Monitor a TiDB Cluster](op-guide/monitor.md) + Scale - [Scale a TiDB Cluster](op-guide/horizontal-scale.md) - - [Use Ansible to Scale](QUICKSTART.md#scale-the-tidb-cluster) - - [Upgrade](op-guide/ansible-deployment.md#perform-rolling-update) + - [Scale Using Ansible](op-guide/ansible-deployment-scale.md) + + Upgrade + - [Upgrade the Component Version](op-guide/ansible-deployment-rolling-update.md#upgrade-the-component-version) + - [TiDB 2.0 Upgrade Guide](op-guide/tidb-v2-upgrade-guide.md) - [Tune Performance](op-guide/tune-tikv.md) + Backup and Migrate - [Backup and Restore](op-guide/backup-restore.md) @@ -100,19 +110,37 @@ - [Migration Overview](op-guide/migration-overview.md) - [Migrate All the Data](op-guide/migration.md#use-the-mydumper--loader-tool-to-export-and-import-all-the-data) - [Migrate the Data Incrementally](op-guide/migration.md#use-the-syncer-tool-to-import-data-incrementally-optional) - - [Deploy TiDB Using the Binary](op-guide/binary-deployment.md) + - [TiDB-Ansible Common Operations](op-guide/ansible-operation.md) - [Troubleshoot](trouble-shooting.md) -+ TiDB Utilities - - [Syncer User Guide](tools/syncer.md) - - [Loader User Guide](tools/loader.md) - - [TiDB-Binlog User Guide](tools/tidb-binlog-kafka.md) - - [PD Control User Guide](tools/pd-control.md) -+ The TiDB Connector for Spark ++ TiDB Enterprise Tools + - [Syncer](tools/syncer.md) + - [mydumper](tools/mydumper.md) + - [Loader](tools/loader.md) + - [TiDB-Binlog](tools/tidb-binlog-kafka.md) + - [PD Control](tools/pd-control.md) + - [PD Recover](tools/pd-recover.md) + - [TiKV Control](https://github.com/tikv/tikv/blob/master/docs/tools/tikv-control.md) + - [TiDB Controller](tools/tidb-controller.md) ++ [TiKV Documentation](https://github.com/tikv/tikv/wiki) ++ TiSpark Documentation - [Quick Start Guide](tispark/tispark-quick-start-guide.md) - [User Guide](tispark/tispark-user-guide.md) - [Frequently Asked Questions (FAQ)](FAQ.md) -- [TiDB Best Practices](https://pingcap.github.io/blog/2017/07/24/tidbbestpractice/) +- [TiDB Best Practices](https://pingcap.com/blog/2017-07-24-tidbbestpractice/) + [Releases](releases/rn.md) + - [2.0.8](releases/208.md) + - [2.1 RC3](releases/21rc3.md) + - [2.1 RC2](releases/21rc2.md) + - [2.0.7](releases/207.md) + - [2.1 RC1](releases/21rc1.md) + - [2.0.6](releases/206.md) + - [2.0.5](releases/205.md) + - [2.1 Beta](releases/21beta.md) + - [2.0.4](releases/204.md) + - [2.0.3](releases/203.md) + - [2.0.2](releases/202.md) + - [2.0.1](releases/201.md) + - [2.0](releases/2.0ga.md) - [2.0 RC5](releases/2rc5.md) - [2.0 RC4](releases/2rc4.md) - [2.0 RC3](releases/2rc3.md) @@ -143,7 +171,7 @@ ## TiDB Introduction -TiDB (The pronunciation is: /'taɪdiːbi:/ tai-D-B, etymology: titanium) is an open source distributed scalable Hybrid Transactional and Analytical Processing (HTAP) database built by PingCAP. Inspired by the design of Google F1 and Google Spanner, TiDB features infinite horizontal scalability, strong consistency, and high availability. The goal of TiDB is to serve as a one-stop solution for both OLTP (Online Transactional Processing) and OLAP (Online Analytical Processing). +TiDB (The pronunciation is: /'taɪdiːbi:/ tai-D-B, etymology: titanium) is an open-source distributed scalable Hybrid Transactional and Analytical Processing (HTAP) database. It features infinite horizontal scalability, strong consistency, and high availability. TiDB is MySQL compatible and serves as a one-stop data warehouse for both OLTP (Online Transactional Processing) and OLAP (Online Analytical Processing) workloads. - __Horizontal scalability__ @@ -161,7 +189,7 @@ TiDB (The pronunciation is: /'taɪdiːbi:/ tai-D-B, etymology: titanium) is an o TiDB is designed to work in the cloud -- public, private, or hybrid -- making deployment, provisioning, and maintenance drop-dead simple. -- __No more ETL__ +- __Minimize ETL__ ETL (Extract, Transform and Load) is no longer necessary with TiDB's hybrid OLTP/OLAP architecture, enabling you to create new values for your users, easier and faster. diff --git a/ROADMAP.md b/ROADMAP.md index 0927ff786d196..7463f7fe053a5 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -1,5 +1,6 @@ --- title: TiDB Roadmap +summary: Learn about the roadmap of TiDB. category: Roadmap --- @@ -9,63 +10,91 @@ This document defines the roadmap for TiDB development. ## TiDB: -- [ ] Optimizer - - [ ] Refactor Ranger - - [ ] Optimize the statistics info - - [ ] Optimize the cost model -- [ ] Executor - - [ ] Parallel Operators - - [ ] Compact Row Format to reduce memory usage - - [ ] File Sort -- [ ] Support View -- [ ] Support Window Function ++ [ ] Optimizer + - [x] Refactor Ranger + - [x] Optimize the cost model + - [ ] Cascades model planner + - [ ] Join Reorder ++ [ ] Statistics + - [x] Update statistics dynamically according to the query feedback + - [x] Analyze table automatically + - [x] Improve the accuracy of Row Count estimation ++ [ ] Execution Engine + - [ ] Push down the Projection operator to the Coprocessor + - [x] Improve the performance of the HashJoin operator + + [ ] Parallel Operators + - [x] Projection + - [x] Aggregation + - [ ] Sort + - [x] Compact Row Format to reduce memory usage + - [ ] File Sort +- [ ] View +- [ ] Window Function - [ ] Common Table Expression -- [ ] Table Partition -- [ ] Hash time index to resolve the issue with hot regions -- [ ] Reverse Index ++ [ ] Table Partition + - [x] Range Partition + - [ ] Hash Partition - [ ] Cluster Index -- [ ] Improve DDL +- [ ] New storage row format +- [ ] Query Tracing ++ [ ] Improve DDL + - [x] Speed up Add Index operation + - [x] Parallel DDL + - [ ] Support locking table + - [ ] Support modifying the column type + - [ ] Supoort modifying the primary key + - [ ] Support multiple DDL operations in a single statement - [ ] Support `utf8_general_ci` collation ## TiKV: -- [ ] Raft - - [ ] Region merge - - [ ] Local read thread - - [ ] Multi-thread raftstore - - [ ] None voter - - [ ] Pre-vote -- [ ] RocksDB - - [ ] DeleteRange -- [ ] Transaction - - [ ] Optimize transaction conflicts -- [ ] Coprocessor - - [ ] Streaming -- [ ] Tool - - [ ] Import distributed data - - [ ] Export distributed data - - [ ] Disaster Recovery -- [ ] Flow control and degradation ++ Raft + - [x] Region Merge - Merge small Regions together to reduce overhead + - [x] Local Read Thread - Process read requests in a local read thread + - [x] Split Region in Batch - Speed up Region split for large Regions + - [x] Raft Learner - Support Raft learner to smooth the configuration change process + - [x] Raft Pre-vote - Support Raft pre-vote to avoid unnecessary leader election on network isolation + - [ ] Joint Consensus - Change multi members safely. + - [ ] Multi-thread Raftstore - Process Region Raft logic in multiple threads + - [ ] Multi-thread apply pool - Apply Region Raft committed entries in multiple threads ++ Engine + - [ ] Titan - Separate large key-values from LSM-Tree + - [ ] Pluggable Engine Interface - Clean up the engine wrapper code and provide more extensibility ++ Storage + - [ ] Flow Control - Do flow control in scheduler to avoid write stall in advance ++ Transaction + - [x] Optimize transaction conflicts + - [ ] Distributed GC - Distribute MVCC garbage collection control to TiKV ++ Coprocessor + - [x] Streaming - Cut large data set into small chunks to optimize memory consumption + - [ ] Chunk Execution - Process data in chunk to improve performance + - [ ] Request Tracing - Provide per-request execution details ++ Tools + - [x] TiKV Importer - Speed up data importing by SST file ingestion ++ Client + - [ ] TiKV client (Rust crate) + - [ ] Batch gRPC Message - Reduce message overhead ## PD: -- [ ] Improve namespace - - [ ] Different replication policies for different namespaces and tables - - [ ] Decentralize scheduling table regions - - [ ] Scheduler supports prioritization to be more controllable +- [x] Improve namespace + - [x] Different replication policies for different namespaces and tables +- [x] Decentralize scheduling table Regions +- [x] Scheduler supports prioritization to be more controllable - [ ] Use machine learning to optimize scheduling +- [ ] Optimize Region metadata - Save Region metadata in detached storage engine ## TiSpark: -- [ ] Limit / Order push-down -- [ ] Access through the DAG interface and deprecate the Select interface +- [ ] Limit/Order push-down +- [x] Access through the DAG interface and deprecate the Select interface - [ ] Index Join and parallel merge join - [ ] Data Federation -## SRE & tools: +## Tools: -- [ ] Kubernetes based intergration for the on-premise version -- [ ] Dashboard UI for the on-premise version -- [ ] The cluster backup and recovery tool -- [ ] The data migration tool (Wormhole V2) -- [ ] Security and system diagnosis +- [X] Tool for automating TiDB deployment +- [X] High-Performance data import tool +- [X] Backup and restore tool (incremental backup supported) +- [ ] Data online migration tool (premium edition of Syncer) +- [ ] Diagnostic tools diff --git a/adopters.md b/adopters.md index aab9d3fdb7d2a..7ad9a87af08a1 100644 --- a/adopters.md +++ b/adopters.md @@ -1,5 +1,6 @@ --- title: TiDB Adopters +summary: Learn about the list of TiDB adopters in various industries. category: adopters --- @@ -7,25 +8,71 @@ category: adopters This is a list of TiDB adopters in various industries. -- [Bank of Beijing (Banking)](http://www.bankofbeijing.com.cn/en2011/index.html) -- [Mobike (Ridesharing)](https://mobike.com/global/) -- [Ele.me (Catering)](https://www.crunchbase.com/organization/ele-me) -- [Yiguo.com (E-commerce)](https://www.datanami.com/2018/02/22/hybrid-database-capturing-perishable-insights-yiguo/) -- [Toutiao (Media)](https://www.crunchbase.com/organization/toutiao) -- [Phoenix TV (Media)](http://www.ifeng.com/) -- [LeCloud (MediaTech)](http://www.lecloud.com/en-us/) -- [Mobikok (Marketing)](http://www.kokmobi.com/en/cn/index.asp) -- [Ping++ (Payment)](https://www.crunchbase.com/organization/ping-5) -- [Qunar.com (Travel)](https://www.crunchbase.com/organization/qunar-com) -- [LinkDoc Technology (HealthTech)](https://www.crunchbase.com/organization/linkdoc-technology) -- [Yuanfudao (EdTech)](https://www.crunchbase.com/organization/yuanfudao) -- [ZuoZhu Financial (FinTech)](http://www.zuozh.com/) -- [360 Financial (FinTech)](https://jinrong.360jie.com.cn/) -- [GAEA (Gaming)](http://gaea.com/en) -- [YOOZOO GAMES (Gaming)](http://www.yoozoo.com/en) -- [FUNYOURS JAPAN (Gaming)](http://company.funyours.co.jp/) -- [Hainan eKing Technology (Enterprise Technology)](https://www.crunchbase.com/organization/hainan-eking-technology) -- [2Dfire (FoodTech)](http://www.2dfire.com/) -- [G7 (Internet of Things)](https://www.english.g7.com.cn/) -- [Yimian Data (Big Data)](https://www.yimian.com.cn) -- [Wanda Internet Technology Group (Big Data)](http://www.wanda-tech.cn/en/) \ No newline at end of file +| Company | Industry | Success Story | +| :--- | :--- | :--- | +|[Mobike](https://en.wikipedia.org/wiki/Mobike)|Ridesharing|[English](https://www.pingcap.com/blog/Use-Case-TiDB-in-Mobike/); [Chinese](https://www.pingcap.com/cases-cn/user-case-mobike/)| +|[Jinri Toutiao](https://en.wikipedia.org/wiki/Toutiao)|Mobile News Platform|[Chinese](https://www.pingcap.com/cases-cn/user-case-toutiao/)| +|[Yiguo.com](https://www.crunchbase.com/organization/shanghai-yiguo-electron-business)|E-commerce|[English](https://www.datanami.com/2018/02/22/hybrid-database-capturing-perishable-insights-yiguo/); [Chinese](https://www.pingcap.com/cases-cn/user-case-yiguo)| +|[Yuanfudao.com](https://www.crunchbase.com/organization/yuanfudao)|EdTech|[English](https://www.pingcap.com/blog/2017-08-08-tidbforyuanfudao/); [Chinese](https://www.pingcap.com/cases-cn/user-case-yuanfudao/)| +|[Ele.me](https://en.wikipedia.org/wiki/Ele.me)|Food Delivery|[English](https://www.pingcap.com/blog/use-case-tidb-in-eleme/); [Chinese](https://www.pingcap.com/cases-cn/user-case-eleme-1/)| +|[LY.com](https://www.crunchbase.com/organization/ly-com)|Travel|[Chinese](https://www.pingcap.com/cases-cn/user-case-tongcheng/)| +|[Qunar.com](https://www.crunchbase.com/organization/qunar-com)|Travel|[Chinese](https://www.pingcap.com/cases-cn/user-case-qunar/)| +|[Hulu](https://www.hulu.com)|Entertainment|| +|[VIPKID](https://en.wikipedia.org/wiki/VIPKID)|EdTech|| +|[Lenovo](https://en.wikipedia.org/wiki/Lenovo)|Enterprise Technology|| +|[Bank of Beijing](https://en.wikipedia.org/wiki/Bank_of_Beijing)|Banking|| +|[Industrial and Commercial Bank of China](https://en.wikipedia.org/wiki/Industrial_and_Commercial_Bank_of_China)|Banking|| +|[iQiyi](https://en.wikipedia.org/wiki/IQiyi)|Media and Entertainment|| +|[Yimian Data](https://www.crunchbase.com/organization/yimian-data)|Big Data|[Chinese](https://www.pingcap.com/cases-cn/user-case-yimian)| +|[Phoenix New Media](https://www.crunchbase.com/organization/phoenix-new-media)|Media|[Chinese](https://www.pingcap.com/cases-cn/user-case-ifeng/)| +|[Mobikok](http://www.mobikok.com/en/)|AdTech|[Chinese](https://pingcap.com/cases-cn/user-case-mobikok/)| +|[LinkDoc Technology](https://www.crunchbase.com/organization/linkdoc-technology)|HealthTech|[Chinese](https://www.pingcap.com/cases-cn/user-case-linkdoc/)| +|[G7 Networks](https://www.english.g7.com.cn/)| Logistics|[Chinese](https://www.pingcap.com/cases-cn/user-case-g7/)| +|[360 Finance](https://www.crunchbase.com/organization/360-finance)|FinTech|[Chinese](https://www.pingcap.com/cases-cn/user-case-360/)| +|[GAEA](http://www.gaea.com/en/)|Gaming|[English](https://www.pingcap.com/blog/2017-05-22-Comparison-between-MySQL-and-TiDB-with-tens-of-millions-of-data-per-day/); [Chinese](https://www.pingcap.com/cases-cn/user-case-gaea-ad/)| +|[YOOZOO Games](https://www.crunchbase.com/organization/yoozoo-games)|Gaming|[Chinese](https://pingcap.com/cases-cn/user-case-youzu/)| +|[Seasun Games](https://www.crunchbase.com/organization/seasun)|Gaming|[Chinese](https://pingcap.com/cases-cn/user-case-xishanju/)| +|[NetEase Games](https://game.163.com/en/)|Gaming|| +|[FUNYOURS JAPAN](http://company.funyours.co.jp/)|Gaming|[Chinese](https://pingcap.com/cases-cn/user-case-funyours-japan/)| +|[Zhaopin.com](https://www.crunchbase.com/organization/zhaopin)|Recruiting|| +|[Panda.tv](https://www.crunchbase.com/organization/panda-tv)|Live Streaming|| +|[Hoodinn](https://www.crunchbase.com/organization/hoodinn)|Gaming|| +|[Ping++](https://www.crunchbase.com/organization/ping-5)|Mobile Payment|[Chinese](https://pingcap.com/cases-cn/user-case-ping++/)| +|[Hainan eKing Technology](https://www.crunchbase.com/organization/hainan-eking-technology)|Enterprise Technology|[Chinese](https://pingcap.com/cases-cn/user-case-ekingtech/)| +|[LianLian Tech](http://www.10030.com.cn/web/)|Mobile Payment|| +|[Tongdun Technology](https://www.crunchbase.com/organization/tongdun-technology)|FinTech|| +|[Wacai](https://www.crunchbase.com/organization/wacai)|FinTech|| +|[Tree Finance](https://www.treefinance.com.cn/)|FinTech|| +|[2Dfire.com](http://www.2dfire.com/)|FoodTech|[Chinese](https://www.pingcap.com/cases-cn/user-case-erweihuo/)| +|[Happigo.com](https://www.crunchbase.com/organization/happigo-com)|E-commerce|| +|[Mashang Consumer Finance](https://www.crunchbase.com/organization/ms-finance)|FinTech|| +|[Tencent OMG](https://en.wikipedia.org/wiki/Tencent)|Media|| +|[Terren](http://webterren.com.zigstat.com/)|Media|| +|[LeCloud](https://www.crunchbase.com/organization/letv-2)|Media|| +|[Miaopai](https://en.wikipedia.org/wiki/Miaopai)|Media|| +|[Snowball Finance](https://www.crunchbase.com/organization/snowball-finance)|FinTech|| +|[Yimutian](http://www.ymt.com/)|E-commerce|| +|[Gengmei](https://www.crunchbase.com/organization/gengmei)|Plastic Surgery|| +|[Acewill](https://www.crunchbase.com/organization/acewill)|FoodTech|| +|[Keruyun](https://www.crunchbase.com/organization/keruyun-technology-beijing-co-ltd)|SaaS|[Chinese](https://pingcap.com/cases-cn/user-case-keruyun/)| +|[Youju Tech](https://www.ujuz.cn/)|E-Commerce|| +|[Maizuo](https://www.crunchbase.com/organization/maizhuo)|E-Commerce|| +|[Mogujie](https://www.crunchbase.com/organization/mogujie)|E-Commerce|| +|[Zhuan Zhuan](https://www.crunchbase.com/organization/zhuan-zhuan)|Online Marketplace|[Chinese](https://pingcap.com/cases-cn/user-case-zhuanzhuan/)| +|[Shuangchuang Huipu](http://scphjt.com/)|FinTech|| +|[Meizu](https://en.wikipedia.org/wiki/Meizu)|Media|| +|[SEA group](https://sea-group.org/?lang=en)|Gaming|| +|[Sogou](https://en.wikipedia.org/wiki/Sogou)|MediaTech|| +|[Chunyu Yisheng](https://www.crunchbase.com/organization/chunyu)|HealthTech|| +|[Meituan](https://en.wikipedia.org/wiki/Meituan-Dianping)|Food Delivery|| +|[Qutoutiao](https://www.crunchbase.com/organization/qutoutiao)|Social Network|| +|[QuantGroup](https://www.crunchbase.com/organization/quantgroup)|FinTech|| +|[FINUP](https://www.crunchbase.com/organization/finup)|FinTech|| +[Meili Finance](https://www.crunchbase.com/organization/meili-jinrong)|FinTech|| +|[Guolian Securities](https://www.crunchbase.com/organization/guolian-securities)|Financial Services|| +|[Founder Securities](https://www.linkedin.com/company/founder-securities-co-ltd-/)|Financial Services|| +|[China Telecom Shanghai](http://sh.189.cn/en/index.html)|Telecom|| +|[State Administration of Taxation](https://en.wikipedia.org/wiki/State_Administration_of_Taxation)|Finance|| +|[Wuhan Antian Information Technology](https://www.avlsec.com/)|Enterprise Technology|| +|[Ausnutria Dairy](https://www.crunchbase.com/organization/ausnutria-dairy)|FoodTech|| +|[Qingdao Telaidian](https://www.teld.cn/)|Electric Car Charger|| \ No newline at end of file diff --git a/architecture.md b/architecture.md new file mode 100644 index 0000000000000..a7af16cf94a79 --- /dev/null +++ b/architecture.md @@ -0,0 +1,47 @@ +--- +title: TiDB Architecture +summary: The key architecture components of the TiDB platform +category: introduction +--- + +# TiDB Architecture + +The TiDB platform is comprised of three key components: the TiDB server, the PD server, and the TiKV server. In addition, TiDB also provides the [TiSpark](https://github.com/pingcap/tispark/) component for the complex OLAP requirements. + +![image alt text](media/tidb-architecture.png) + +## TiDB server + +The TiDB server is in charge of the following operations: + +1. Receiving the SQL requests + +2. Processing the SQL related logics + +3. Locating the TiKV address for storing and computing data through Placement Driver (PD) + +4. Exchanging data with TiKV + +5. Returning the result + +The TiDB server is stateless. It does not store data and it is for computing only. TiDB is horizontally scalable and provides the unified interface to the outside through the load balancing components such as Linux Virtual Server (LVS), HAProxy, or F5. + +## Placement Driver server + +The Placement Driver (PD) server is the managing component of the entire cluster and is in charge of the following three operations: + +1. Storing the metadata of the cluster such as the region location of a specific key. + +2. Scheduling and load balancing regions in the TiKV cluster, including but not limited to data migration and Raft group leader transfer. + +3. Allocating the transaction ID that is globally unique and monotonic increasing. + +As a cluster, PD needs to be deployed to an odd number of nodes. Usually it is recommended to deploy to 3 online nodes at least. + +## TiKV server + +The TiKV server is responsible for storing data. From an external view, TiKV is a distributed transactional Key-Value storage engine. Region is the basic unit to store data. Each Region stores the data for a particular Key Range which is a left-closed and right-open interval from StartKey to EndKey. There are multiple Regions in each TiKV node. TiKV uses the Raft protocol for replication to ensure the data consistency and disaster recovery. The replicas of the same Region on different nodes compose a Raft Group. The load balancing of the data among different TiKV nodes are scheduled by PD. Region is also the basic unit for scheduling the load balance. + +## TiSpark + +TiSpark deals with the complex OLAP requirements. TiSpark makes Spark SQL directly run on the storage layer of the TiDB cluster, combines the advantages of the distributed TiKV cluster, and integrates into the big data ecosystem. With TiSpark, TiDB can support both OLTP and OLAP scenarios in one cluster, so the users never need to worry about data synchronization. diff --git a/benchmark/sysbench-v2.md b/benchmark/sysbench-v2.md new file mode 100644 index 0000000000000..8bb15fd03c3c9 --- /dev/null +++ b/benchmark/sysbench-v2.md @@ -0,0 +1,133 @@ +--- +title: TiDB Sysbench Performance Test Report -- v2.0.0 vs. v1.0.0 +category: benchmark +--- + +# TiDB Sysbench Performance Test Report -- v2.0.0 vs. v1.0.0 + +## Test purpose + +This test aims to compare the performances of TiDB 1.0 and TiDB 2.0. + +## Test version, time, and place + +TiDB version: v1.0.8 vs. v2.0.0-rc6 + +Time: April 2018 + +Place: Beijing, China + +## Test environment + +IDC machine + +| Type | Name | +| -------- | --------- | +| OS | linux (CentOS 7.3.1611) | +| CPU | 40 vCPUs, Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz | +| RAM | 128GB | +| DISK | Optane 500GB SSD * 1 | + +Sysbench test script: +https://github.com/pingcap/tidb-bench/tree/master/sysbench + + +## Test plan + +### TiDB version information + +### v1.0.8 + +| Component | GitHash | +| -------- | --------- | +| TiDB | 571f0bbd28a0b8155a5ee831992c986b90d21ab7 | +| TiKV | 4ef5889947019e3cb55cc744f487aa63b42540e7 | +| PD | 776bcd940b71d295a2c7ed762582bc3aff7d3c0e | + +### v2.0.0-rc6 + +| Component | GitHash | +| :--------: | :---------: | +| TiDB | 82d35f1b7f9047c478f4e1e82aa0002abc8107e7 | +| TiKV | 7ed4f6a91f92cad5cd5323aaebe7d9f04b77cc79 | +| PD | 2c8e7d7e33b38e457169ce5dfb2f461fced82d65 | + +### TiKV parameter configuration + +- v1.0.8 + + ``` + sync-log = false + grpc-concurrency = 8 + grpc-raft-conn-num = 24 + ``` + +- v2.0.0-rc6 + + ``` + sync-log = false + grpc-concurrency = 8 + grpc-raft-conn-num = 24 + use-delete-range: false + ``` + +### Cluster topology + +| Machine IP | Deployment instance | +|--------------|------------| +| 172.16.21.1 | 1*tidb 1*pd 1*sysbench | +| 172.16.21.2 | 1*tidb 1*pd 1*sysbench | +| 172.16.21.3 | 1*tidb 1*pd 1*sysbench | +| 172.16.11.4 | 1*tikv | +| 172.16.11.5 | 1*tikv | +| 172.16.11.6 | 1*tikv | +| 172.16.11.7 | 1*tikv | +| 172.16.11.8 | 1*tikv | +| 172.16.11.9 | 1*tikv | + +## Test result + +### Standard `Select` test + +| Version | Table count | Table size | Sysbench threads |QPS | Latency (avg/.95) | +| :---: | :---: | :---: | :---: | :---: | :---: | +| v2.0.0-rc6 | 32 | 10 million | 128 * 3 | 201936 | 1.9033 ms/5.67667 ms | +| v2.0.0-rc6 | 32 | 10 million | 256 * 3 | 208130 | 3.69333 ms/8.90333 ms | +| v2.0.0-rc6 | 32 | 10 million | 512 * 3 | 211788 | 7.23333 ms/15.59 ms | +| v2.0.0-rc6 | 32 | 10 million | 1024 * 3 | 212868 | 14.5933 ms/43.2133 ms | +| v1.0.8 | 32 | 10 million | 128 * 3 | 188686 | 2.03667 ms/5.99 ms | +| v1.0.8 | 32 | 10 million | 256 * 3 | 195090 |3.94 ms/9.12 ms | +| v1.0.8 | 32 | 10 million | 512 * 3 | 203012 | 7.57333 ms/15.3733 ms | +| v1.0.8 | 32 | 10 million | 1024 * 3 | 205932 | 14.9267 ms/40.7633 ms | + +According to the statistics above, the `Select` query performance of TiDB 2.0 GA has increased by about 10% at most than that of TiDB 1.0 GA. + +### Standard OLTP test + +| Version | Table count | Table size | Sysbench threads | TPS | QPS | Latency (avg/.95) | +| :---: | :---: | :---: | :---: | :---: | :---: | :---:| +| v2.0.0-rc6 | 32 | 10 million | 128 * 3 | 5404.22 | 108084.4 | 87.2033 ms/110 ms | +| v2.0.0-rc6 | 32 | 10 million | 256 * 3 | 5578.165 | 111563.3 | 167.673 ms/275.623 ms | +| v2.0.0-rc6 | 32 | 10 million | 512 * 3 | 5874.045 | 117480.9 | 315.083 ms/674.017 ms | +| v2.0.0-rc6 | 32 | 10 million | 1024 * 3 | 6290.7 | 125814 | 529.183 ms/857.007 ms | +| v1.0.8 | 32 | 10 million | 128 * 3 | 5523.91 | 110478 | 69.53 ms/88.6333 ms | +| v1.0.8 | 32 | 10 million | 256 * 3 | 5969.43 | 119389 |128.63 ms/162.58 ms | +| v1.0.8 | 32 | 10 million | 512 * 3 | 6308.93 | 126179 | 243.543 ms/310.913 ms | +| v1.0.8 | 32 | 10 million | 1024 * 3 | 6444.25 | 128885 | 476.787ms/635.143 ms | + +According to the statistics above, the OLTP performance of TiDB 2.0 GA and TiDB 1.0 GA is almost the same. + +### Standard `Insert` test + +| Version | Table count | Table size | Sysbench threads | QPS | Latency (avg/.95) | +| :---: | :---: | :---: | :---: | :---: | :---: | +| v2.0.0-rc6 | 32 | 10 million | 128 * 3 | 31707.5 | 12.11 ms/21.1167 ms | +| v2.0.0-rc6 | 32 | 10 million | 256 * 3 | 38741.2 | 19.8233 ms/39.65 ms | +| v2.0.0-rc6 | 32 | 10 million | 512 * 3 | 45136.8 | 34.0267 ms/66.84 ms | +| v2.0.0-rc6 | 32 | 10 million | 1024 * 3 | 48667 | 63.1167 ms/121.08 ms | +| v1.0.8 | 32 | 10 million | 128 * 3 | 31125.7 | 12.3367 ms/19.89 ms | +| v1.0.8 | 32 | 10 million | 256 * 3 | 36800 | 20.8667 ms/35.3767 ms | +| v1.0.8 | 32 | 10 million | 512 * 3 | 44123 | 34.8067 ms/63.32 ms | +| v1.0.8 | 32 | 10 million | 1024 * 3 | 48496 | 63.3333 ms/118.92 ms | + +According to the statistics above, the `Insert` query performance of TiDB 2.0 GA has increased slightly than that of TiDB 1.0 GA. diff --git a/benchmark/sysbench-v3.md b/benchmark/sysbench-v3.md new file mode 100644 index 0000000000000..da473bc1d3938 --- /dev/null +++ b/benchmark/sysbench-v3.md @@ -0,0 +1,142 @@ +--- +title: TiDB Sysbench Performance Test Report -- v2.1 vs. v2.0 +category: benchmark +--- + +# TiDB Sysbench Performance Test Report -- v2.1 vs. v2.0 + +## Test purpose + +This test aims to compare the performances of TiDB 2.1 and TiDB 2.0 in the OLTP scenario. + +## Test version, time, and place + +TiDB version: v2.1.0-rc.2 vs. v2.0.6 + +Time: September, 2018 + +Place: Beijing, China + +## Test environment + +IDC machine: + +| Type | Name | +| :-: | :-: | +| OS | Linux (CentOS 7.3.1611) | +| CPU | 40 vCPUs, Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz | +| RAM | 128GB | +| DISK | Optane 500GB SSD \* 1 | + +Sysbench version: 1.1.0 + +## Test plan + +Use Sysbench to import **16 tables, with 10,000,000 pieces of data in each table**. With the HAProxy, requests are sent to the cluster at an incremental concurrent number. A single concurrent test lasts 5 minutes. + +### TiDB version information + +### v2.1.0-rc.2 + +| Component | GitHash | +| :-: | :-: | +| TiDB | 08e56cd3bae166b2af3c2f52354fbc9818717f62 | +| TiKV | 57e684016dafb17dc8a6837d30224be66cbc7246 | +| PD | 6a7832d2d6e5b2923c79683183e63d030f954563 | + +### v2.0.6 + +| Component | GitHash | +| :-: | :-: | +| TiDB | b13bc08462a584a085f377625a7bab0cc0351570 | +| TiKV | 57c83dc4ebc93d38d77dc8f7d66db224760766cc | +| PD | b64716707b7279a4ae822be767085ff17b5f3fea | + +### TiDB parameter configuration + +The default TiDB configuration is used in both v2.1 and v2.0. + +### TiKV parameter configuration + +The following TiKV configuration is used in both v2.1 and v2.0: + +```txt +[readpool.storage] +normal-concurrency = 8 +[server] +grpc-concurrency = 8 +[raftstore] +sync-log = false +[rocksdb.defaultcf] +block-cache-size = "60GB" +[rocksdb.writecf] +block-cache-size = "20GB" +``` + +### Cluster topology + +| Machine IP | Deployment instance | +| :-: | :-: | +| 172.16.30.31 | 1\*Sysbench 1\*HAProxy | +| 172.16.30.32 | 1\*TiDB 1\*pd 1\*TiKV | +| 172.16.30.33 | 1\*TiDB 1\*TiKV | +| 172.16.30.34 | 1\*TiDB 1\*TiKV | + +## Test result + +### `Point Select` test + +| Version | Threads | QPS | 95% Latency (ms) | +| :-: | :-: | :-: | :-: | +| v2.1 | 64 | 111481.09 | 1.16 | +| v2.1 | 128 | 145102.62 | 2.52 | +| v2.1 | 256 | 161311.9 | 4.57 | +| v2.1 | 512 | 184991.19 | 7.56 | +| v2.1 | 1024 | 230282.74 | 10.84 | +| v2.0 | 64 | 75285.87 | 1.93 | +| v2.0 | 128 | 92141.79 | 3.68 | +| v2.0 | 256 | 107464.93 | 6.67 | +| v2.0 | 512 | 121350.61 | 11.65 | +| v2.0 | 1024 | 150036.31 | 17.32 | + +![point select](../media/sysbench_v3_point_select.png) + +According to the statistics above, the `Point Select` query performance of TiDB 2.1 has increased by **50%** than that of TiDB 2.0. + +### `Update Non-Index` test + +| Version | Threads | QPS | 95% Latency (ms) | +| :-: | :-: | :-: | :-: | +| v2.1 | 64 | 18946.09 | 5.77 | +| v2.1 | 128 | 22022.82 | 12.08 | +| v2.1 | 256 | 24679.68 | 25.74 | +| v2.1 | 512 | 25107.1 | 51.94 | +| v2.1 | 1024 | 27144.92 | 106.75 | +| v2.0 | 64 | 16316.85 | 6.91 | +| v2.0 | 128 | 20944.6 | 11.45 | +| v2.0 | 256 | 24017.42 | 23.1 | +| v2.0 | 512 | 25994.33 | 46.63 | +| v2.0 | 1024 | 27917.52 | 92.42 | + +![update non-index](../media/sysbench_v3_update_non_index.png) + +According to the statistics above, the `Update Non-Index` write performance of TiDB 2.1 and TiDB 2.0 is almost the same. + +### `Update Index` test + +| Version | Threads | QPS | 95% Latency (ms) | +| :-: | :-: | :-: | :-: | +| v2.1 | 64 | 9934.49 | 12.08 | +| v2.1 | 128 | 10505.95 | 25.28 | +| v2.1 | 256 | 11007.7 | 55.82 | +| v2.1 | 512 | 11198.81 | 106.75 | +| v2.1 | 1024 | 11591.89 | 200.47 | +| v2.0 | 64 | 9754.68 | 11.65 | +| v2.0 | 128 | 10603.31 | 24.38 | +| v2.0 | 256 | 11011.71 | 50.11 | +| v2.0 | 512 | 11162.63 | 104.84 | +| v2.0 | 1024 | 12067.63 | 179.94 | + +![update index](../media/sysbench_v3_update_index.png) + +According to the statistics above, the `Update Index` write performance of TiDB 2.1 and TiDB 2.0 is almost the same. diff --git a/benchmark/tpch-v2.md b/benchmark/tpch-v2.md new file mode 100644 index 0000000000000..45fe34964c265 --- /dev/null +++ b/benchmark/tpch-v2.md @@ -0,0 +1,103 @@ +--- +title: TiDB TPC-H 50G Performance Test Report V2.1 +category: benchmark +--- + +# TiDB TPC-H 50G Performance Test Report V2.1 + +## Test purpose + +This test aims to compare the performances of TiDB 2.0 and TiDB 2.1 in the OLAP scenario. + +> **Note**: Different test environments might lead to different test results. + +## Test environment + +### Machine information + +System information: + +| Machine IP | Operation system | Kernel version | File system type | +|--------------|------------------------|------------------------------|--------------| +| 10.0.1.4 | CentOS 7.5.1804 64bit | 3.10.0-862.3.3.el7.x86\_64 | ext4 | +| 10.0.1.5 | CentOS 7.5.1804 64bit | 3.10.0-862.3.3.el7.x86\_64 | ext4 | +| 10.0.1.6 | CentOS 7.5.1804 64bit | 3.10.0-862.3.3.el7.x86\_64 | ext4 | +| 10.0.1.7 | CentOS 7.5.1804 64bit | 3.10.0-862.3.3.el7.x86\_64 | ext4 | +| 10.0.1.8 | CentOS 7.5.1804 64bit | 3.10.0-862.3.3.el7.x86\_64 | ext4 | +| 10.0.1.9 | CentOS 7.5.1804 64bit | 3.10.0-862.3.3.el7.x86\_64 | ext4 | + +Hardware information: + +| Type | 10.0.1.4 | 10.0.1.5, 10.0.1.6, 10.0.1.7, 10.0.1.8, 10.0.1.9 | +|------------|------------------------------------------------------|------------------------------------------------------| +| CPU | 16 vCPUs, Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz | 8 vCPUs, Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz | +| Memory | 110G | 55G | +| Disk | 221G SSD | 111G SSD | +| Network card | 10 Gigabit Ethernet, 10000Mb/s | 10 Gigabit Ethernet, 10000Mb/s | + +### TPC-H + +[tidb-bench/tpch](https://github.com/pingcap/tidb-bench/tree/master/tpch) + +### Cluster topology + +| Machine IP | Deployment Instance | +|----------|------------| +| 10.0.1.5 | TiKV \* 1 | +| 10.0.1.6 | TiKV \* 1 | +| 10.0.1.7 | TiKV \* 1 | +| 10.0.1.8 | TiKV \* 1 | +| 10.0.1.9 | TiKV \* 1 | +| 10.0.1.4 | PD \* 1 | +| 10.0.1.4 | TiDB \* 1 | + +### TiDB version information + +TiDB 2.0: + +| Component | Version | Commit Hash | +|--------|-------------|--------------------------------------------| +| TiDB | v2.0.7 | 29ec059cb3b7d14b6f52c2f219f94a89570162bc | +| TiKV | v2.0.7 | d0b8cd7c7f62f06e7ef456837bd32a47da1ca4cd | +| PD | v2.0.5 | b64716707b7279a4ae822be767085ff17b5f3fea | + +TiDB 2.1: + +| Component | Version | Commit Hash | +|--------|-------------|--------------------------------------------| +| TiDB | v2.1.0-rc.2 | 16864f95b47f859ed6104555ccff0387abdc2429 | +| TiKV | v2.1.0-rc.2 | 8458ce53ebbd434c48baac6373fe0f0a43a54005 | +| PD | v2.1.0-rc.2 | 55db505e8f35e8ab4e00efd202beb27a8ecc40fb | + +## Test result + +| Query ID | TiDB 2.0 | TiDB 2.1 | +|-----------|----------------|----------------| +| 1 | 121.550595999s | 91.4755480289s | +| 2 | 53.0638680458s | 23.1186130047s | +| 3 | 75.7236940861s | 61.790802002s | +| 4 | 30.2647120953s | 26.3483440876s | +| 6 | 51.4850790501s | 34.6432199478s | +| 7 | 216.787364006s | 94.9856910706s | +| 8 | 188.717588902s | 181.852752209s | +| 9 | 546.438174009s | 414.462754965s | +| 10 | 109.978317022s | 37.0369961262s | +| 11 | 42.9398438931s | 37.6951580048s | +| 12 | 60.455039978s | 40.2236878872s | +| 13 | 230.278712988s | 70.2887151241s | +| 14 | 61.2673521042s | 35.8372960091s | +| 16 | 30.2539310455s | 18.5897550583s | +| 17 | 3200.70173788s | 263.095014811s | +| 18 | 1035.59847498s | 296.360667944s | +| 19 | 54.3732938766s | 40.4523630142s | +| 20 | 105.094577074s | 53.2429068089s | +| 21 | 389.883709908s | 361.034544945s | +| 22 | 64.0494630337s | 65.7153418064s | + +![TPC-H Query Result](../media/tpch-query-result-v2.png) + +It should be noted that: + +- In the diagram above, the red bars represent the query results of Release 2.1 and the blue bars represent the query results of Release 2.0. The y-axis represents the processing time of queries in seconds, the shorter the faster. +- The result of Query 15 is not displayed because VIEW is currently not supported in either TiDB 2.1 or 2.0. +- The result of Query 5 is not displayed because no result is returned during a long period of time caused by the Join Order issue. \ No newline at end of file diff --git a/benchmark/tpch.md b/benchmark/tpch.md new file mode 100644 index 0000000000000..322fc387a0a07 --- /dev/null +++ b/benchmark/tpch.md @@ -0,0 +1,106 @@ +--- +title: TiDB TPC-H 50G Performance Test Report V2.0 +category: benchmark +--- + +# TiDB TPC-H 50G Performance Test Report + +## Test purpose + +This test aims to compare the performances of TiDB 1.0 and TiDB 2.0 in the OLAP scenario. + +> **Note**: Different test environments might lead to different test results. + +## Test environment + +### Machine information + +System information: + +| Machine IP | Operation system | Kernel version | File system type | +|--------------|------------------------|------------------------------|--------------| +| 172.16.31.2 | Ubuntu 17.10 64bit | 4.13.0-16-generic | ext4 | +| 172.16.31.3 | Ubuntu 17.10 64bit | 4.13.0-16-generic | ext4 | +| 172.16.31.4 | Ubuntu 17.10 64bit | 4.13.0-16-generic | ext4 | +| 172.16.31.6 | CentOS 7.4.1708 64bit | 3.10.0-693.11.6.el7.x86\_64 | ext4 | +| 172.16.31.8 | CentOS 7.4.1708 64bit | 3.10.0-693.11.6.el7.x86\_64 | ext4 | +| 172.16.31.10 | CentOS 7.4.1708 64bit | 3.10.0-693.11.6.el7.x86\_64 | ext4 | + +Hardware information: + +| Type | Name | +|------------|------------------------------------------------------| +| CPU | 40 vCPUs, Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz | +| RAM | 128GB, 16GB RDIMM * 8, 2400MT/s, dual channel, x8 bitwidth | +| DISK | Intel P4500 4T SSD * 2 | +| Network Card | 10 Gigabit Ethernet | + +### TPC-H + +[tidb-bench/tpch](https://github.com/pingcap/tidb-bench/tree/master/tpch) + +### Cluster topology + +| Machine IP | Deployment Instance | +|--------------|---------------------| +| 172.16.31.2 | TiKV \* 2 | +| 172.16.31.3 | TiKV \* 2 | +| 172.16.31.6 | TiKV \* 2 | +| 172.16.31.8 | TiKV \* 2 | +| 172.16.31.10 | TiKV \* 2 | +| 172.16.31.10 | PD \* 1 | +| 172.16.31.4 | TiDB \* 1 | + +### Corresponding TiDB version information + +TiDB 1.0: + +| Component | Version | Commit Hash | +|--------|-------------|--------------------------------------------| +| TiDB | v1.0.9 | 4c7ee3580cd0a69319b2c0c08abdc59900df7344 | +| TiKV | v1.0.8 | 2bb923a4cd23dbf68f0d16169fd526dc5c1a9f4a | +| PD | v1.0.8 | 137fa734472a76c509fbfd9cb9bc6d0dc804a3b7 | + +TiDB 2.0: + +| Component | Version | Commit Hash | +|--------|-------------|--------------------------------------------| +| TiDB | v2.0.0-rc.6 | 82d35f1b7f9047c478f4e1e82aa0002abc8107e7 | +| TiKV | v2.0.0-rc.6 | 8bd5c54966c6ef42578a27519bce4915c5b0c81f | +| PD | v2.0.0-rc.6 | 9b824d288126173a61ce7d51a71fc4cb12360201 | + +## Test result + +| Query ID | TiDB 2.0 | TiDB 1.0 | +|-----------|--------------------|------------------| +| 1 | 33.915s | 215.305s | +| 2 | 25.575s | Nan | +| 3 | 59.631s | 196.003s | +| 4 | 30.234s | 249.919s | +| 5 | 31.666s | OOM | +| 6 | 13.111s | 118.709s | +| 7 | 31.710s | OOM | +| 8 | 31.734s | 800.546s | +| 9 | 34.211s | 630.639s | +| 10 | 30.774s | 133.547s | +| 11 | 27.692s | 78.026s | +| 12 | 27.962s | 124.641s | +| 13 | 27.676s | 174.695s | +| 14 | 19.676s | 110.602s | +| 15 | NaN | Nan | +| 16 | 24.890s | 40.529s | +| 17 | 245.796s | NaN | +| 18 | 91.256s | OOM | +| 19 | 37.615s | NaN | +| 20 | 44.167s | 212.201s | +| 21 | 31.466s | OOM | +| 22 | 31.539s | 125.471s | + +![TPC-H Query Result](../media/tpch-query-result.png) + +It should be noted that: + +- In the diagram above, the orange bars represent the query results of Release 1.0 and the blue bars represent the query results of Release 2.0. The y-axis represents the processing time of queries in seconds, the shorter the faster. +- Query 15 is tagged with "NaN" because VIEW is currently not supported in either TiDB 1.0 or 2.0. We have plans to provide VIEW support in a future release. +- Queries 2, 17, and 19 in the TiDB 1.0 column are tagged with "NaN" because TiDB 1.0 did not return results for these queries. +- Queries 5, 7, 18, and 21 in the TiDB 1.0 column are tagged with "OOM" because the memory consumption was too high. diff --git a/bikeshare-example-database.md b/bikeshare-example-database.md new file mode 100644 index 0000000000000..56786762130f2 --- /dev/null +++ b/bikeshare-example-database.md @@ -0,0 +1,66 @@ +--- +title: Bikeshare Example Database +summary: Install the Bikeshare example database. +category: user guide +--- + +# Bikeshare Example Database + +Examples used in the TiDB manual use [System Data](https://www.capitalbikeshare.com/system-data) from Capital Bikeshare, released under the [Capital Bikeshare Data License Agreement](https://www.capitalbikeshare.com/data-license-agreement). + +## Download all data files + +The system data is available [for download in .zip files](https://s3.amazonaws.com/capitalbikeshare-data/index.html) organized per year. Downloading and extracting all files requires approximately 3GB of disk space. To download all files for years 2010-2017 using a bash script: + +```bash +mkdir -p bikeshare-data && cd bikeshare-data + +for YEAR in 2010 2011 2012 2013 2014 2015 2016 2017; do + wget https://s3.amazonaws.com/capitalbikeshare-data/${YEAR}-capitalbikeshare-tripdata.zip + unzip ${YEAR}-capitalbikeshare-tripdata.zip +done; +``` + +## Load data into TiDB + +The system data can be imported into TiDB using the following schema: + +```sql +CREATE DATABASE bikeshare; +USE bikeshare; + +CREATE TABLE trips ( + trip_id bigint NOT NULL PRIMARY KEY auto_increment, + duration integer not null, + start_date datetime, + end_date datetime, + start_station_number integer, + start_station varchar(255), + end_station_number integer, + end_station varchar(255), + bike_number varchar(255), + member_type varchar(255) +); +``` + +You can import files individually using the example `LOAD DATA` command here, or import all files using the bash loop below: + +```sql +LOAD DATA LOCAL INFILE '2017Q1-capitalbikeshare-tripdata.csv' INTO TABLE trips + FIELDS TERMINATED BY ',' ENCLOSED BY '"' + LINES TERMINATED BY '\r\n' + IGNORE 1 LINES +(duration, start_date, end_date, start_station_number, start_station, +end_station_number, end_station, bike_number, member_type); +``` + +### Import all files + +To import all `*.csv` files into TiDB in a bash loop: + +```bash +for FILE in `ls *.csv`; do + echo "== $FILE ==" + mysql bikeshare -e "LOAD DATA LOCAL INFILE '${FILE}' INTO TABLE trips FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES (duration, start_date, end_date, start_station_number, start_station, end_station_number, end_station, bike_number, member_type);" +done; +``` diff --git a/community.md b/community.md index 6b023b0007900..6d1192048eaaf 100644 --- a/community.md +++ b/community.md @@ -1,5 +1,6 @@ --- title: Connect with us +summary: Learn about how to connect with us. category: community --- diff --git a/dev-guide/deployment.md b/dev-guide/deployment.md index c3b5c5f791221..231a70c714455 100644 --- a/dev-guide/deployment.md +++ b/dev-guide/deployment.md @@ -2,11 +2,9 @@ ## Overview -Note: **The easiest way to deploy TiDB is to use the official binary package directly, see [Binary Deployment](../op-guide/binary-deployment.md).** +Note: **The easiest way to deploy TiDB is to use TiDB Ansible, see [Ansible Deployment](../op-guide/ansible-deployment.md).** -If you want to build the TiDB project, deploy the binaries to other machines and run them, you can follow this guide. - -Check the [supported platforms](./requirements.md#supported-platforms) and [prerequisites](./requirements.md#prerequisites) first. +Before you start, check the [supported platforms](../dev-guide/requirements.md#supported-platforms) and [prerequisites](../dev-guide/requirements.md#prerequisites) first. ## Building and installing TiDB components diff --git a/dev-guide/development.md b/dev-guide/development.md index bfcedbc31648e..e3da3dcd3ba89 100644 --- a/dev-guide/development.md +++ b/dev-guide/development.md @@ -4,7 +4,7 @@ If you want to develop the TiDB project, you can follow this guide. -Before you begin, check the [supported platforms](./requirements.md#supported-platforms) and [prerequisites](./requirements.md#prerequisites) first. +Before you begin, check the [supported platforms](../dev-guide/requirements.md#supported-platforms) and [prerequisites](../dev-guide/requirements.md#prerequisites) first. ## Build TiKV diff --git a/media/overview.png b/media/overview.png new file mode 100644 index 0000000000000..8a665fb4d82e8 Binary files /dev/null and b/media/overview.png differ diff --git a/media/sysbench_v3_point_select.png b/media/sysbench_v3_point_select.png new file mode 100644 index 0000000000000..42232abc105de Binary files /dev/null and b/media/sysbench_v3_point_select.png differ diff --git a/media/sysbench_v3_update_index.png b/media/sysbench_v3_update_index.png new file mode 100644 index 0000000000000..bbc65d10814ae Binary files /dev/null and b/media/sysbench_v3_update_index.png differ diff --git a/media/sysbench_v3_update_non_index.png b/media/sysbench_v3_update_non_index.png new file mode 100644 index 0000000000000..0c003ace60f32 Binary files /dev/null and b/media/sysbench_v3_update_non_index.png differ diff --git a/media/tidb-architecture.png b/media/tidb-architecture.png index b0fa6767259b3..51d4f57aa7dd2 100644 Binary files a/media/tidb-architecture.png and b/media/tidb-architecture.png differ diff --git a/media/tikv_stack.png b/media/tikv_stack.png new file mode 100644 index 0000000000000..4f8b1b6d4d45e Binary files /dev/null and b/media/tikv_stack.png differ diff --git a/media/tpch-query-result-v2.png b/media/tpch-query-result-v2.png new file mode 100644 index 0000000000000..035a4d33e224c Binary files /dev/null and b/media/tpch-query-result-v2.png differ diff --git a/media/tpch-query-result.png b/media/tpch-query-result.png new file mode 100644 index 0000000000000..c9e6a51b6415c Binary files /dev/null and b/media/tpch-query-result.png differ diff --git a/op-guide/ansible-deployment-rolling-update.md b/op-guide/ansible-deployment-rolling-update.md new file mode 100644 index 0000000000000..b3758551c7082 --- /dev/null +++ b/op-guide/ansible-deployment-rolling-update.md @@ -0,0 +1,145 @@ +--- +title: Upgrade TiDB Using TiDB-Ansible +summary: Use TiDB-Ansible to perform a rolling update for a TiDB cluster. +category: operations +--- + +# Upgrade TiDB Using TiDB-Ansible + +When you perform a rolling update for a TiDB cluster, the service is shut down serially and is started after you update the service binary and the configuration file. If the load balancing is configured in the front-end, the rolling update of TiDB does not impact the running applications. Minimum requirements: `pd*3, tidb*2, tikv*3`. + +> **Note:** If the binlog is enabled, and Pump and Drainer services are deployed in the TiDB cluster, stop the Drainer service before the rolling update. The Pump service is automatically updated in the rolling update of TiDB. + +## Upgrade the component version + +- To upgrade between large versions, you need to upgrade [`tidb-ansible`](https://github.com/pingcap/tidb-ansible). If you want to upgrade the version of TiDB from 1.0 to 2.0, see [TiDB 2.0 Upgrade Guide](../op-guide/tidb-v2-upgrade-guide.md). + +- For a minor upgrade, it is also recommended to update `tidb-ansible` for the latest configuration file templates, features, and bug fixes. + +### Download the binary automatically + +1. Edit the value of the `tidb_version` parameter in the `/home/tidb/tidb-ansible/inventory.ini` file, and specify the version number you need to upgrade to. + + For example, to upgrade from `v2.0.6` to `v2.0.7`: + + ``` + tidb_version = v2.0.7 + ``` + + > **Note:** If you use `tidb-ansible` of the master branch, you can keep `tidb_version = latest`. The installation package of the latest TiDB version is updated each day. + +2. Delete the existing `downloads` directory `/home/tidb/tidb-ansible/downloads/`. + + ``` + $ cd /home/tidb/tidb-ansible + $ rm -rf downloads + ``` + +3. Use `playbook` to download the TiDB binary and replace the existing binary in `/home/tidb/tidb-ansible/resource/bin/` with it automatically. + + ``` + $ ansible-playbook local_prepare.yml + ``` + +### Download the binary manually + +You can also download the binary manually. Use `wget` to download the binary and replace the existing binary in `/home/tidb/tidb-ansible/resource/bin/` with it manually. + +``` +wget http://download.pingcap.org/tidb-v2.0.7-linux-amd64.tar.gz +``` + +> **Note:** Remember to replace the version number in the download link with the one you need. + +If you use `tidb-ansible` of the master branch, download the binary using the following command: + +``` +$ wget http://download.pingcap.org/tidb-latest-linux-amd64.tar.gz +``` + +### Perform a rolling update using Ansible + +- Apply a rolling update to the PD node (only upgrade the PD service) + + ``` + $ ansible-playbook rolling_update.yml --tags=pd + ``` + + When you apply a rolling update to the PD leader instance, if the number of PD instances is not less than 3, Ansible migrates the PD leader to another node before stopping this instance. + +- Apply a rolling update to the TiKV node (only upgrade the TiKV service) + + ``` + $ ansible-playbook rolling_update.yml --tags=tikv + ``` + + When you apply a rolling update to the TiKV instance, Ansible migrates the Region leader to other nodes. The concrete logic is as follows: Call the PD API to add the `evict leader scheduler` -> Inspect the `leader_count` of this TiKV instance every 10 seconds -> Wait the `leader_count` to reduce to below 1, or until the times of inspecting the `leader_count` is more than 18 -> Start closing the rolling update of TiKV after three minutes of timeout -> Delete the `evict leader scheduler` after successful start. The operations are executed serially. + + If the rolling update fails in the process, log in to `pd-ctl` to execute `scheduler show` and check whether `evict-leader-scheduler` exists. If it does exist, delete it manually. Replace `{PD_IP}` and `{STORE_ID}` with your PD IP and the `store_id` of the TiKV instance: + + ``` + $ /home/tidb/tidb-ansible/resources/bin/pd-ctl -u "http://{PD_IP}:2379"$ /home/tidb/tidb-ansible/resources/bin/pd-ctl -u "http://{PD_IP}:2379" + » scheduler show + [ + "label-scheduler", + "evict-leader-scheduler-{STORE_ID}", + "balance-region-scheduler", + "balance-leader-scheduler", + "balance-hot-region-scheduler" + ] + » scheduler remove evict-leader-scheduler-{STORE_ID} + ``` + +- Apply a rolling update to the TiDB node (only upgrade the TiDB service) + + If the binlog is enabled in the TiDB cluster, the Pump service is automatically upgraded in the rolling update of the TiDB service. + + ``` + $ ansible-playbook rolling_update.yml --tags=tidb + ``` + +- Apply a rolling update to all services (upgrade PD, TiKV, and TiDB in sequence) + + If the binlog is enabled in the TiDB cluster, the Pump service is automatically upgraded in the rolling update of the TiDB service. + + ``` + $ ansible-playbook rolling_update.yml + ``` + +- Apply a rolling update to the monitoring component + + ``` + $ ansible-playbook rolling_update_monitor.yml + ``` + +## Modify component configuration + +This section describes how to modify component configuration using Ansible. + +1. Update the component configuration template. + + The component configuration template of the TiDB cluster is in the `/home/tidb/tidb-ansible/conf` folder. + + | Component | Template Name of the Configuration File | + | :-------- | :----------: | + | TiDB | tidb.yml | + | TiKV | tikv.yml | + | PD | pd.yml | + + The comment status if the default configuration item, which uses the default value. To modify it, you need to cancel the comment by removing `#` and then modify the corresponding parameter value. + + The configuration template uses the yaml format, so separate the parameter name and the parameter value using `:`, and indent two spaces. + + For example, modify the value of the `high-concurrency`, `normal-concurrency` and `low-concurrency` parameters to 16 for the TiKV component: + + ```bash + readpool: + coprocessor: + # Notice: if CPU_NUM > 8, the default thread pool size for coprocessors + # will be set to CPU_NUM * 0.8. + high-concurrency: 16 + normal-concurrency: 16 + low-concurrency: 16 + ``` + +2. After modifying the component configuration, you need to perform a rolling update using Ansible. See [Perform a rolling update using Ansible](#perform-a-rolling-update-using-ansible). \ No newline at end of file diff --git a/op-guide/ansible-deployment-scale.md b/op-guide/ansible-deployment-scale.md new file mode 100644 index 0000000000000..100a8ca4217aa --- /dev/null +++ b/op-guide/ansible-deployment-scale.md @@ -0,0 +1,472 @@ +--- +title: Scale the TiDB Cluster Using TiDB-Ansible +summary: Use TiDB-Ansible to increase/decrease the capacity of a TiDB/TiKV/PD node. +category: operations +--- + +# Scale the TiDB Cluster Using TiDB-Ansible + +The capacity of a TiDB cluster can be increased or decreased without affecting the online services. + +> **Warning:** In decreasing the capacity, if your cluster has a mixed deployment of other services, do not perform the following procedures. The following examples assume that the removed nodes have no mixed deployment of other services. + +Assume that the topology is as follows: + +| Name | Host IP | Services | +| ---- | ------- | -------- | +| node1 | 172.16.10.1 | PD1 | +| node2 | 172.16.10.2 | PD2 | +| node3 | 172.16.10.3 | PD3, Monitor | +| node4 | 172.16.10.4 | TiDB1 | +| node5 | 172.16.10.5 | TiDB2 | +| node6 | 172.16.10.6 | TiKV1 | +| node7 | 172.16.10.7 | TiKV2 | +| node8 | 172.16.10.8 | TiKV3 | +| node9 | 172.16.10.9 | TiKV4 | + +## Increase the capacity of a TiDB/TiKV node + +For example, if you want to add two TiDB nodes (node101, node102) with the IP addresses `172.16.10.101` and `172.16.10.102`, take the following steps: + +1. Edit the `inventory.ini` file and append the node information: + + ```ini + [tidb_servers] + 172.16.10.4 + 172.16.10.5 + 172.16.10.101 + 172.16.10.102 + + [pd_servers] + 172.16.10.1 + 172.16.10.2 + 172.16.10.3 + + [tikv_servers] + 172.16.10.6 + 172.16.10.7 + 172.16.10.8 + 172.16.10.9 + + [monitored_servers] + 172.16.10.1 + 172.16.10.2 + 172.16.10.3 + 172.16.10.4 + 172.16.10.5 + 172.16.10.6 + 172.16.10.7 + 172.16.10.8 + 172.16.10.9 + 172.16.10.101 + 172.16.10.102 + + [monitoring_servers] + 172.16.10.3 + + [grafana_servers] + 172.16.10.3 + ``` + + Now the topology is as follows: + + | Name | Host IP | Services | + | ---- | ------- | -------- | + | node1 | 172.16.10.1 | PD1 | + | node2 | 172.16.10.2 | PD2 | + | node3 | 172.16.10.3 | PD3, Monitor | + | node4 | 172.16.10.4 | TiDB1 | + | node5 | 172.16.10.5 | TiDB2 | + | **node101** | **172.16.10.101**|**TiDB3** | + | **node102** | **172.16.10.102**|**TiDB4** | + | node6 | 172.16.10.6 | TiKV1 | + | node7 | 172.16.10.7 | TiKV2 | + | node8 | 172.16.10.8 | TiKV3 | + | node9 | 172.16.10.9 | TiKV4 | + +2. Initialize the newly added node: + + ``` + ansible-playbook bootstrap.yml -l 172.16.10.101,172.16.10.102 + ``` + + > **Note:** If an alias is configured in the `inventory.ini` file, for example, `node101 ansible_host=172.16.10.101`, use `-l` to specify the alias when executing `ansible-playbook`. For example, `ansible-playbook bootstrap.yml -l node101,node102`. This also applies to the following steps. + +3. Deploy the newly added node: + + ``` + ansible-playbook deploy.yml -l 172.16.10.101,172.16.10.102 + ``` + +4. Start the newly added node: + + ``` + ansible-playbook start.yml -l 172.16.10.101,172.16.10.102 + ``` + +5. Update the Prometheus configuration and restart the cluster: + + ``` + ansible-playbook rolling_update_monitor.yml --tags=prometheus + ``` + +6. Monitor the status of the entire cluster and the newly added node by opening a browser to access the monitoring platform: `http://172.16.10.3:3000`. + +You can use the same procedure to add a TiKV node. But to add a PD node, some configuration files need to be manually updated. + +## Increase the capacity of a PD node + +For example, if you want to add a PD node (node103) with the IP address `172.16.10.103`, take the following steps: + +1. Edit the `inventory.ini` file and append the node information: + + ```ini + [tidb_servers] + 172.16.10.4 + 172.16.10.5 + + [pd_servers] + 172.16.10.1 + 172.16.10.2 + 172.16.10.3 + 172.16.10.103 + + [tikv_servers] + 172.16.10.6 + 172.16.10.7 + 172.16.10.8 + 172.16.10.9 + + [monitored_servers] + 172.16.10.4 + 172.16.10.5 + 172.16.10.1 + 172.16.10.2 + 172.16.10.3 + 172.16.10.103 + 172.16.10.6 + 172.16.10.7 + 172.16.10.8 + 172.16.10.9 + + [monitoring_servers] + 172.16.10.3 + + [grafana_servers] + 172.16.10.3 + ``` + + Now the topology is as follows: + + | Name | Host IP | Services | + | ---- | ------- | -------- | + | node1 | 172.16.10.1 | PD1 | + | node2 | 172.16.10.2 | PD2 | + | node3 | 172.16.10.3 | PD3, Monitor | + | **node103** | **172.16.10.103** | **PD4** | + | node4 | 172.16.10.4 | TiDB1 | + | node5 | 172.16.10.5 | TiDB2 | + | node6 | 172.16.10.6 | TiKV1 | + | node7 | 172.16.10.7 | TiKV2 | + | node8 | 172.16.10.8 | TiKV3 | + | node9 | 172.16.10.9 | TiKV4 | + +2. Initialize the newly added node: + + ``` + ansible-playbook bootstrap.yml -l 172.16.10.103 + ``` + +3. Deploy the newly added node: + + ``` + ansible-playbook deploy.yml -l 172.16.10.103 + ``` + +4. Login the newly added PD node and edit the starting script: + + ``` + {deploy_dir}/scripts/run_pd.sh + ``` + + 1. Remove the `--initial-cluster="xxxx" \` configuration. + 2. Add `--join="http://172.16.10.1:2379" \`. The IP address (`172.16.10.1`) can be any of the existing PD IP address in the cluster. + 3. Manually start the PD service in the newly added PD node: + + ``` + {deploy_dir}/scripts/start_pd.sh + ``` + + 4. Use `pd-ctl` to check whether the new node is added successfully: + + ``` + ./pd-ctl -u "http://172.16.10.1:2379" + ``` + + > **Note:** `pd-ctl` is a command used to check the number of PD nodes. + +5. Apply a rolling update to the entire cluster: + + ``` + ansible-playbook rolling_update.yml + ``` + +6. Update the Prometheus configuration and restart the cluster: + + ``` + ansible-playbook rolling_update_monitor.yml --tags=prometheus + ``` + +7. Monitor the status of the entire cluster and the newly added node by opening a browser to access the monitoring platform: `http://172.16.10.3:3000`. + +## Decrease the capacity of a TiDB node + +For example, if you want to remove a TiDB node (node5) with the IP address `172.16.10.5`, take the following steps: + +1. Stop all services on node5: + + ``` + ansible-playbook stop.yml -l 172.16.10.5 + ``` + +2. Edit the `inventory.ini` file and remove the node information: + + ```ini + [tidb_servers] + 172.16.10.4 + #172.16.10.5 # the removed node + + [pd_servers] + 172.16.10.1 + 172.16.10.2 + 172.16.10.3 + + [tikv_servers] + 172.16.10.6 + 172.16.10.7 + 172.16.10.8 + 172.16.10.9 + + [monitored_servers] + 172.16.10.4 + #172.16.10.5 # the removed node + 172.16.10.1 + 172.16.10.2 + 172.16.10.3 + 172.16.10.6 + 172.16.10.7 + 172.16.10.8 + 172.16.10.9 + + [monitoring_servers] + 172.16.10.3 + + [grafana_servers] + 172.16.10.3 + ``` + + Now the topology is as follows: + + | Name | Host IP | Services | + | ---- | ------- | -------- | + | node1 | 172.16.10.1 | PD1 | + | node2 | 172.16.10.2 | PD2 | + | node3 | 172.16.10.3 | PD3, Monitor | + | node4 | 172.16.10.4 | TiDB1 | + | **node5** | **172.16.10.5** | **TiDB2 removed** | + | node6 | 172.16.10.6 | TiKV1 | + | node7 | 172.16.10.7 | TiKV2 | + | node8 | 172.16.10.8 | TiKV3 | + | node9 | 172.16.10.9 | TiKV4 | + +3. Update the Prometheus configuration and restart the cluster: + + ``` + ansible-playbook rolling_update_monitor.yml --tags=prometheus + ``` + +4. Monitor the status of the entire cluster by opening a browser to access the monitoring platform: `http://172.16.10.3:3000`. + +## Decrease the capacity of a TiKV node + +For example, if you want to remove a TiKV node (node9) with the IP address `172.16.10.9`, take the following steps: + +1. Remove the node from the cluster using `pd-ctl`: + + 1. View the store ID of node9: + + ``` + ./pd-ctl -u "http://172.16.10.1:2379" -d store + ``` + + 2. Remove node9 from the cluster, assuming that the store ID is 10: + + ``` + ./pd-ctl -u "http://172.16.10.1:2379" -d store delete 10 + ``` + +2. Use Grafana or `pd-ctl` to check whether the node is successfully removed: + + ``` + ./pd-ctl -u "http://172.16.10.1:2379" -d store 10 + ``` + + > **Note:** It takes some time to remove the node. If the status of the node you remove becomes Tombstone, then this node is successfully removed. + +3. After the node is successfully removed, stop the services on node9: + + ``` + ansible-playbook stop.yml -l 172.16.10.9 + ``` + +4. Edit the `inventory.ini` file and remove the node information: + + ```ini + [tidb_servers] + 172.16.10.4 + 172.16.10.5 + + [pd_servers] + 172.16.10.1 + 172.16.10.2 + 172.16.10.3 + + [tikv_servers] + 172.16.10.6 + 172.16.10.7 + 172.16.10.8 + #172.16.10.9 # the removed node + + [monitored_servers] + 172.16.10.4 + 172.16.10.5 + 172.16.10.1 + 172.16.10.2 + 172.16.10.3 + 172.16.10.6 + 172.16.10.7 + 172.16.10.8 + #172.16.10.9 # the removed node + + [monitoring_servers] + 172.16.10.3 + + [grafana_servers] + 172.16.10.3 + ``` + + Now the topology is as follows: + + | Name | Host IP | Services | + | ---- | ------- | -------- | + | node1 | 172.16.10.1 | PD1 | + | node2 | 172.16.10.2 | PD2 | + | node3 | 172.16.10.3 | PD3, Monitor | + | node4 | 172.16.10.4 | TiDB1 | + | node5 | 172.16.10.5 | TiDB2 | + | node6 | 172.16.10.6 | TiKV1 | + | node7 | 172.16.10.7 | TiKV2 | + | node8 | 172.16.10.8 | TiKV3 | + | **node9** | **172.16.10.9** | **TiKV4 removed** | + +5. Update the Prometheus configuration and restart the cluster: + + ``` + ansible-playbook rolling_update_monitor.yml --tags=prometheus + ``` + +6. Monitor the status of the entire cluster by opening a browser to access the monitoring platform: `http://172.16.10.3:3000`. + +## Decrease the capacity of a PD node + +For example, if you want to remove a PD node (node2) with the IP address `172.16.10.2`, take the following steps: + +1. Remove the node from the cluster using `pd-ctl`: + + 1. View the name of node2: + + ``` + ./pd-ctl -u "http://172.16.10.1:2379" -d member + ``` + + 2. Remove node2 from the cluster, assuming that the name is pd2: + + ``` + ./pd-ctl -u "http://172.16.10.1:2379" -d member delete name pd2 + ``` + +2. Use Grafana or `pd-ctl` to check whether the node is successfully removed: + + ``` + ./pd-ctl -u "http://172.16.10.1:2379" -d member + ``` + +3. After the node is successfully removed, stop the services on node2: + + ``` + ansible-playbook stop.yml -l 172.16.10.2 + ``` + +4. Edit the `inventory.ini` file and remove the node information: + + ```ini + [tidb_servers] + 172.16.10.4 + 172.16.10.5 + + [pd_servers] + 172.16.10.1 + #172.16.10.2 # the removed node + 172.16.10.3 + + [tikv_servers] + 172.16.10.6 + 172.16.10.7 + 172.16.10.8 + 172.16.10.9 + + [monitored_servers] + 172.16.10.4 + 172.16.10.5 + 172.16.10.1 + #172.16.10.2 # the removed node + 172.16.10.3 + 172.16.10.6 + 172.16.10.7 + 172.16.10.8 + 172.16.10.9 + + [monitoring_servers] + 172.16.10.3 + + [grafana_servers] + 172.16.10.3 + ``` + + Now the topology is as follows: + + | Name | Host IP | Services | + | ---- | ------- | -------- | + | node1 | 172.16.10.1 | PD1 | + | **node2** | **172.16.10.2** | **PD2 removed** | + | node3 | 172.16.10.3 | PD3, Monitor | + | node4 | 172.16.10.4 | TiDB1 | + | node5 | 172.16.10.5 | TiDB2 | + | node6 | 172.16.10.6 | TiKV1 | + | node7 | 172.16.10.7 | TiKV2 | + | node8 | 172.16.10.8 | TiKV3 | + | node9 | 172.16.10.9 | TiKV4 | + +5. Perform a rolling update to the entire TiDB cluster: + + ``` + ansible-playbook rolling_update.yml + ``` + +6. Update the Prometheus configuration and restart the cluster: + + ``` + ansible-playbook rolling_update_monitor.yml --tags=prometheus + ``` + +7. To monitor the status of the entire cluster, open a browser to access the monitoring platform: `http://172.16.10.3:3000`. \ No newline at end of file diff --git a/op-guide/ansible-deployment.md b/op-guide/ansible-deployment.md index 49f1e7434dd42..5a26b167660dd 100644 --- a/op-guide/ansible-deployment.md +++ b/op-guide/ansible-deployment.md @@ -1,111 +1,363 @@ --- -title: Ansible Deployment +title: Deploy TiDB Using Ansible +summary: Use Ansible to deploy a TiDB cluster. category: operations --- -# Ansible Deployment +# Deploy TiDB Using Ansible + +This guide describes how to deploy a TiDB cluster using Ansible. For the production environment, it is recommended to deploy TiDB using Ansible. ## Overview -Ansible is an IT automation tool. It can configure systems, deploy software, and orchestrate more advanced IT tasks such as continuous deployments or zero downtime rolling updates. +Ansible is an IT automation tool that can configure systems, deploy software, and orchestrate more advanced IT tasks such as continuous deployments or zero downtime rolling updates. [TiDB-Ansible](https://github.com/pingcap/tidb-ansible) is a TiDB cluster deployment tool developed by PingCAP, based on Ansible playbook. TiDB-Ansible enables you to quickly deploy a new TiDB cluster which includes PD, TiDB, TiKV, and the cluster monitoring modules. -You can use the TiDB-Ansible configuration file to set up the cluster topology, completing all operation tasks with one click, including: - -- Initializing operating system parameters -- Deploying the components -- Rolling upgrade, including module survival detection -- Cleaning data -- Cleaning environment -- Configuring monitoring modules +You can use the TiDB-Ansible configuration file to set up the cluster topology and complete all the following operation tasks: +- Initialize operating system parameters +- Deploy the whole TiDB cluster +- [Start the TiDB cluster](../op-guide/ansible-operation.md#start-a-cluster) +- [Stop the TiDB cluster](../op-guide/ansible-operation.md#stop-a-cluster) +- [Modify component configuration](../op-guide/ansible-deployment-rolling-update.md#modify-component-configuration) +- [Scale the TiDB cluster](../op-guide/ansible-deployment-scale.md) +- [Upgrade the component version](../op-guide/ansible-deployment-rolling-update.md#upgrade-the-component-version) +- [Clean up data of the TiDB cluster](../op-guide/ansible-operation.md#clean-up-cluster-data) +- [Destroy the TiDB cluster](../op-guide/ansible-operation.md#destroy-a-cluster) ## Prepare -Before you start, make sure that you have: +Before you start, make sure you have: + +1. Several target machines that meet the following requirements: + + - 4 or more machines + + A standard TiDB cluster contains 6 machines. You can use 4 machines for testing. For more details, see [Software and Hardware Requirements](../op-guide/recommendation.md). + + - CentOS 7.3 (64 bit) or later, x86_64 architecture (AMD64) + - Network between machines + + > **Note:** When you deploy TiDB using Ansible, **use SSD disks for the data directory of TiKV and PD nodes**. Otherwise, it cannot pass the check. If you only want to try TiDB out and explore the features, it is recommended to [deploy TiDB using Docker Compose](../op-guide/docker-compose.md) on a single machine. + +2. A Control Machine that meets the following requirements: + + > **Note:** The Control Machine can be one of the target machines. + + - CentOS 7.3 (64 bit) or later with Python 2.7 installed + - Access to the Internet + +## Step 1: Install system dependencies on the Control Machine + +Log in to the Control Machine using the `root` user account, and run the corresponding command according to your operating system. + +- If you use a Control Machine installed with CentOS 7, run the following command: + + ``` + # yum -y install epel-release git curl sshpass + # yum -y install python-pip + ``` + +- If you use a Control Machine installed with Ubuntu, run the following command: + + ``` + # apt-get -y install git curl sshpass python-pip + ``` + +## Step 2: Create the `tidb` user on the Control Machine and generate the SSH key + +Make sure you have logged in to the Control Machine using the `root` user account, and then run the following command. + +1. Create the `tidb` user. + + ``` + # useradd -m -d /home/tidb tidb + ``` + +2. Set a password for the `tidb` user account. + + ``` + # passwd tidb + ``` + +3. Configure sudo without password for the `tidb` user account by adding `tidb ALL=(ALL) NOPASSWD: ALL` to the end of the sudo file: + + ``` + # visudo + tidb ALL=(ALL) NOPASSWD: ALL + ``` +4. Generate the SSH key. + + Execute the `su` command to switch the user from `root` to `tidb`. + + ``` + # su - tidb + ``` + + Create the SSH key for the `tidb` user account and hit the Enter key when `Enter passphrase` is prompted. After successful execution, the SSH private key file is `/home/tidb/.ssh/id_rsa`, and the SSH public key file is `/home/tidb/.ssh/id_rsa.pub`. + + ``` + $ ssh-keygen -t rsa + Generating public/private rsa key pair. + Enter file in which to save the key (/home/tidb/.ssh/id_rsa): + Created directory '/home/tidb/.ssh'. + Enter passphrase (empty for no passphrase): + Enter same passphrase again: + Your identification has been saved in /home/tidb/.ssh/id_rsa. + Your public key has been saved in /home/tidb/.ssh/id_rsa.pub. + The key fingerprint is: + SHA256:eIBykszR1KyECA/h0d7PRKz4fhAeli7IrVphhte7/So tidb@172.16.10.49 + The key's randomart image is: + +---[RSA 2048]----+ + |=+o+.o. | + |o=o+o.oo | + | .O.=.= | + | . B.B + | + |o B * B S | + | * + * + | + | o + . | + | o E+ . | + |o ..+o. | + +----[SHA256]-----+ + ``` + +## Step 3: Download TiDB-Ansible to the Control Machine + +1. Log in to the Control Machine using the `tidb` user account and enter the `/home/tidb` directory. The corresponding relationship between the `tidb-ansible` branch and TiDB versions is as follows: + + | tidb-ansible branch | TiDB version | Note | + | ------------------- | ------------ | ---- | + | release-2.0 | 2.0 version | This is the latest stable version. You can use it in production. | + | master | master version | This version includes the latest features with a daily update. | + +2. Download the corresponding TiDB-Ansible branch from the [TiDB-Ansible project](https://github.com/pingcap/tidb-ansible). The default folder name is `tidb-ansible`. -1. Several target machines with the following requirements: + - Download the 2.0 version: - - 4 or more machines. At least 3 instances for TiKV. Do not deploy TiKV together with TiDB or PD on the same machine. See [Software and Hardware Requirements](recommendation.md). + ```bash + $ git clone -b release-2.0 https://github.com/pingcap/tidb-ansible.git + ``` + + - Download the master version: - - Recommended Operating system: + ```bash + $ git clone https://github.com/pingcap/tidb-ansible.git + ``` - - CentOS 7.3 or later Linux - - x86_64 architecture (AMD64) - - ext4 filesystem + > **Note:** It is required to download `tidb-ansible` to the `/home/tidb` directory using the `tidb` user account. If you download it to the `/root` directory, a privilege issue occurs. - Use ext4 filesystem for your data disks. Mount ext4 filesystem with the `nodelalloc` mount option. See [Mount the data disk ext4 filesystem with options](#mount-the-data-disk-ext4-filesystem-with-options). + If you have questions regarding which version to use, email to info@pingcap.com for more information or [file an issue](https://github.com/pingcap/tidb-ansible/issues/new). - - The network between machines. Turn off the firewalls and iptables when deploying and turn them on after the deployment. +## Step 4: Install Ansible and its dependencies on the Control Machine - - The same time and time zone for all machines with the NTP service on to synchronize the correct time. See [How to check whether the NTP service is normal](#how-to-check-whether-the-ntp-service-is-normal). +Make sure you have logged in to the Control Machine using the `tidb` user account. - - Create a normal `tidb` user account as the user who runs the service. The `tidb` user can sudo to the root user without a password. See [How to configure SSH mutual trust and sudo without password](#how-to-configure-ssh-mutual-trust-and-sudo-without-password). +It is required to use `pip` to install Ansible and its dependencies, otherwise a compatibility issue occurs. Currently, the TiDB 2.0 GA version and the master version are compatible with Ansible 2.4 and Ansible 2.5. - > **Note:** When you deploy TiDB using Ansible, use SSD disks for the data directory of TiKV and PD nodes. +1. Install Ansible and the dependencies on the Control Machine: -2. A Control Machine with the following requirements: + ```bash + $ cd /home/tidb/tidb-ansible + $ sudo pip install -r ./requirements.txt + ``` - - The Control Machine can be one of the managed nodes. - - It is recommended to install CentOS 7.3 or later version of Linux operating system (Python 2.7 involved by default). - - The Control Machine must have access to the Internet in order to download TiDB and related packages. - - Configure mutual trust of `ssh authorized_key`. In the Control Machine, you can login to the deployment target machine using `tidb` user account without a password. See [How to configure SSH mutual trust and sudo without password](#how-to-configure-ssh-mutual-trust-and-sudo-without-password). + Ansible and the related dependencies are in the `tidb-ansible/requirements.txt` file. -## Download TiDB-Ansible to the Control Machine +2. View the version of Ansible: -Login to the Control Machine using the `tidb` user account and enter the `/home/tidb` directory. Use the following command to download the corresponding version of TiDB-Ansible from GitHub [TiDB-Ansible project](https://github.com/pingcap/tidb-ansible). The default folder name is `tidb-ansible`. The following are examples of downloading various versions, and you can turn to the official team for advice on which version to choose. + ```bash + $ ansible --version + ansible 2.5.0 + ``` -Download the 1.0 GA version: +## Step 5: Configure the SSH mutual trust and sudo rules on the Control Machine + +Make sure you have logged in to the Control Machine using the `tidb` user account. + +1. Add the IPs of your target machines to the `[servers]` section of the `hosts.ini` file. + + ```bash + $ cd /home/tidb/tidb-ansible + $ vi hosts.ini + [servers] + 172.16.10.1 + 172.16.10.2 + 172.16.10.3 + 172.16.10.4 + 172.16.10.5 + 172.16.10.6 + [all:vars] + username = tidb + ntp_server = pool.ntp.org + ``` + +2. Run the following command and input the `root` user account password of your target machines. + + ```bash + $ ansible-playbook -i hosts.ini create_users.yml -u root -k + ``` + + This step creates the `tidb` user account on the target machines, configures the sudo rules and the SSH mutual trust between the Control Machine and the target machines. + +> To configure the SSH mutual trust and sudo without password manually, see [How to manually configure the SSH mutual trust and sudo without password](#how-to-manually-configure-the-ssh-mutual-trust-and-sudo-without-password) + +## Step 6: Install the NTP service on the target machines + +> **Note:** If the time and time zone of all your target machines are same, the NTP service is on and is normally synchronizing time, you can ignore this step. See [How to check whether the NTP service is normal](#how-to-check-whether-the-ntp-service-is-normal). + +Make sure you have logged in to the Control Machine using the `tidb` user account, run the following command: + +```bash +$ cd /home/tidb/tidb-ansible +$ ansible-playbook -i hosts.ini deploy_ntp.yml -u tidb -b ``` -git clone -b release-1.0 https://github.com/pingcap/tidb-ansible.git + +The NTP service is installed and started using the software repository that comes with the system on the target machines. The default NTP server list in the installation package is used. The related `server` parameter is in the `/etc/ntp.conf` configuration file. + +To make the NTP service start synchronizing as soon as possible, the system executes the `ntpdate` command to set the local date and time by polling `ntp_server` in the `hosts.ini` file. The default server is `pool.ntp.org`, and you can also replace it with your NTP server. + +## Step 7: Configure the CPUfreq governor mode on the target machine + +For details about CPUfreq, see [the CPUfreq Governor documentation](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/power_management_guide/cpufreq_governors). + +Set the CPUfreq governor mode to `performance` to make full use of CPU performance. + +### Check the governor modes supported by the system + +You can run the `cpupower frequency-info --governors` command to check the governor modes which the system supports: + ``` +# cpupower frequency-info --governors +analyzing CPU 0: + available cpufreq governors: performance powersave +``` + +Taking the above code for example, the system supports the `performance` and `powersave` modes. -Download the 2.0 version: +> **Note:** As the following shows, if it returns "Not Available", it means that the current system does not support CPUfreq configuration and you can skip this step. ``` -git clone -b release-2.0 https://github.com/pingcap/tidb-ansible.git +# cpupower frequency-info --governors +analyzing CPU 0: + available cpufreq governors: Not Available ``` -or +### Check the current governor mode -Download the master version: +You can run the `cpupower frequency-info --policy` command to check the current CPUfreq governor mode: ``` -git clone https://github.com/pingcap/tidb-ansible.git +# cpupower frequency-info --policy +analyzing CPU 0: + current policy: frequency should be within 1.20 GHz and 3.20 GHz. + The governor "powersave" may decide which speed to use + within this range. ``` -## Install Ansible and dependencies in the Control Machine +As the above code shows, the current mode is `powersave` in this example. -Use `pip` to install Ansible and dependencies on the Control Machine of CentOS 7 system. After installation, you can use `ansible --version` to view the Ansible version. Currently releases-1.0 and release-2.0 depend on Ansible 2.4, while the master version is compatible with Ansible 2.4 and Ansible 2.5. +### Change the governor mode -Ansible and related dependencies are recorded in the `tidb-ansible/requirements.txt` file. Install Ansible and dependencies as follows, otherwise compatibility issue occurs. +- You can run the following command to change the current mode to `performance`: -```bash -$ sudo yum -y install epel-release -$ sudo yum -y install python-pip curl -$ cd tidb-ansible -$ sudo pip install -r ./requirements.txt -$ ansible --version - ansible 2.5.0 -``` + ``` + # cpupower frequency-set --governor performance + ``` -For other systems, see [Install Ansible](ansible-deployment.md#install-ansible). +- You can also run the following command to set the mode on the target machine in batches: -## Orchestrate the TiDB cluster + ``` + $ ansible -i hosts.ini all -m shell -a "cpupower frequency-set --governor performance" -u tidb -b + ``` + +## Step 8: Mount the data disk ext4 filesystem with options on the target machines -The file path of `inventory.ini`: `tidb-ansible/inventory.ini`. +Log in to the Control Machine using the `root` user account. -> **Note:** Use the internal IP address to deploy the cluster. +Format your data disks to the ext4 filesystem and mount the filesystem with the `nodelalloc` and `noatime` options. It is required to mount the `nodelalloc` option, or else the Ansible deployment cannot pass the test. The `noatime` option is optional. -The standard cluster has 6 machines: +> **Note:** If your data disks have been formatted to ext4 and have mounted the options, you can uninstall it by running the `# umount /dev/nvme0n1` command, follow the steps starting from editing the `/etc/fstab` file, and remount the filesystem with options. + +Take the `/dev/nvme0n1` data disk as an example: + +1. View the data disk. + + ``` + # fdisk -l + Disk /dev/nvme0n1: 1000 GB + ``` -- 2 TiDB nodes, the first TiDB machine is used as a monitor -- 3 PD nodes -- 3 TiKV nodes +2. Create the partition table. + + ``` + # parted -s -a optimal /dev/nvme0n1 mklabel gpt -- mkpart primary ext4 1 -1 + ``` + +3. Format the data disk to the ext4 filesystem. + + ``` + # mkfs.ext4 /dev/nvme0n1 + ``` + +4. View the partition UUID of the data disk. + + In this example, the UUID of `nvme0n1` is `c51eb23b-195c-4061-92a9-3fad812cc12f`. + + ``` + # lsblk -f + NAME FSTYPE LABEL UUID MOUNTPOINT + sda + ├─sda1 ext4 237b634b-a565-477b-8371-6dff0c41f5ab /boot + ├─sda2 swap f414c5c0-f823-4bb1-8fdf-e531173a72ed + └─sda3 ext4 547909c1-398d-4696-94c6-03e43e317b60 / + sr0 + nvme0n1 ext4 c51eb23b-195c-4061-92a9-3fad812cc12f + ``` + +5. Edit the `/etc/fstab` file and add the mount options. + + ``` + # vi /etc/fstab + UUID=c51eb23b-195c-4061-92a9-3fad812cc12f /data1 ext4 defaults,nodelalloc,noatime 0 2 + ``` -### The cluster topology of single TiKV instance on a single machine +6. Mount the data disk. + + ``` + # mkdir /data1 + # mount -a + ``` + +7. Check using the following command. + + ``` + # mount -t ext4 + /dev/nvme0n1 on /data1 type ext4 (rw,noatime,nodelalloc,data=ordered) + ``` + + If the filesystem is ext4 and `nodelalloc` is included in the mount options, you have successfully mount the data disk ext4 filesystem with options on the target machines. + +## Step 9: Edit the `inventory.ini` file to orchestrate the TiDB cluster + +Log in to the Control Machine using the `tidb` user account, and edit the `tidb-ansible/inventory.ini` file to orchestrate the TiDB cluster. The standard TiDB cluster contains 6 machines: 2 TiDB nodes, 3 PD nodes and 3 TiKV nodes. + +- Deploy at least 3 instances for TiKV. +- Do not deploy TiKV together with TiDB or PD on the same machine. +- Use the first TiDB machine as the monitoring machine. + +> **Note:** It is required to use the internal IP address to deploy. If the SSH port of the target machines is not the default 22 port, you need to add the `ansible_port` variable. For example, `TiDB1 ansible_host=172.16.10.1 ansible_port=5555`. + +You can choose one of the following two types of cluster topology according to your scenario: + +- [The cluster topology of a single TiKV instance on each TiKV node](#option-1-use-the-cluster-topology-of-a-single-tikv-instance-on-each-tikv-node) + + In most cases, it is recommended to deploy one TiKV instance on each TiKV node for better performance. However, if the CPU and memory of your TiKV machines are much better than the required in [Hardware and Software Requirements](../op-guide/recommendation.md), and you have more than two disks in one node or the capacity of one SSD is larger than 2 TB, you can deploy no more than 2 TiKV instances on a single TiKV node. + +- [The cluster topology of multiple TiKV instances on each TiKV node](#option-2-use-the-cluster-topology-of-multiple-tikv-instances-on-each-tikv-node) + +### Option 1: Use the cluster topology of a single TiKV instance on each TiKV node | Name | Host IP | Services | |:------|:------------|:-----------| @@ -146,10 +398,9 @@ The standard cluster has 6 machines: 172.16.10.6 ``` +### Option 2: Use the cluster topology of multiple TiKV instances on each TiKV node -### The cluster topology of multiple TiKV instances on a single machine - -Take two TiKV instances as an example: +Take two TiKV instances on each TiKV node as an example: | Name | Host IP | Services | |:------|:------------|:-----------| @@ -173,13 +424,10 @@ Take two TiKV instances as an example: [tikv_servers] TiKV1-1 ansible_host=172.16.10.4 deploy_dir=/data1/deploy tikv_port=20171 labels="host=tikv1" TiKV1-2 ansible_host=172.16.10.4 deploy_dir=/data2/deploy tikv_port=20172 labels="host=tikv1" -TiKV1-3 ansible_host=172.16.10.4 deploy_dir=/data3/deploy tikv_port=20173 labels="host=tikv1" TiKV2-1 ansible_host=172.16.10.5 deploy_dir=/data1/deploy tikv_port=20171 labels="host=tikv2" TiKV2-2 ansible_host=172.16.10.5 deploy_dir=/data2/deploy tikv_port=20172 labels="host=tikv2" -TiKV2-3 ansible_host=172.16.10.5 deploy_dir=/data3/deploy tikv_port=20173 labels="host=tikv2" TiKV3-1 ansible_host=172.16.10.6 deploy_dir=/data1/deploy tikv_port=20171 labels="host=tikv3" TiKV3-2 ansible_host=172.16.10.6 deploy_dir=/data2/deploy tikv_port=20172 labels="host=tikv3" -TiKV3-3 ansible_host=172.16.10.6 deploy_dir=/data3/deploy tikv_port=20173 labels="host=tikv3" [monitoring_servers] 172.16.10.1 @@ -203,81 +451,93 @@ location_labels = ["host"] **Edit the parameters in the service configuration file:** -1. For multiple TiKV instances, edit the `end-point-concurrency` and `block-cache-size` parameters in `tidb-ansible/conf/tikv.yml`: +1. For the cluster topology of multiple TiKV instances on each TiKV node, you need to edit the `block-cache-size` parameter in `tidb-ansible/conf/tikv.yml`: - - `end-point-concurrency`: keep the number lower than CPU Vcores - `rocksdb defaultcf block-cache-size(GB)`: MEM * 80% / TiKV instance number * 30% - `rocksdb writecf block-cache-size(GB)`: MEM * 80% / TiKV instance number * 45% - `rocksdb lockcf block-cache-size(GB)`: MEM * 80% / TiKV instance number * 2.5% (128 MB at a minimum) - `raftdb defaultcf block-cache-size(GB)`: MEM * 80% / TiKV instance number * 2.5% (128 MB at a minimum) -2. If multiple TiKV instances are deployed on a same physical disk, edit the `capacity` parameter in `conf/tikv.yml`: +2. For the cluster topology of multiple TiKV instances on each TiKV node, you need to edit the `high-concurrency`, `normal-concurrency` and `low-concurrency` parameters in the `tidb-ansible/conf/tikv.yml` file: - - `capacity`: (DISK - log space) / TiKV instance number (the unit is GB) + ``` + readpool: + coprocessor: + # Notice: if CPU_NUM > 8, default thread pool size for coprocessors + # will be set to CPU_NUM * 0.8. + # high-concurrency: 8 + # normal-concurrency: 8 + # low-concurrency: 8 + ``` -### Description of inventory.ini variables + Recommended configuration: `number of instances * parameter value = CPU_Vcores * 0.8`. -#### Description of the deployment directory +3. If multiple TiKV instances are deployed on a same physical disk, edit the `capacity` parameter in `conf/tikv.yml`: -You can configure the deployment directory using the `deploy_dir` variable. The global variable is set to `/home/tidb/deploy` by default, and it applies to all services. If the data disk is mounted on the `/data1` directory, you can set it to `/data1/deploy`. For example: + - `capacity`: total disk capacity / number of TiKV instances (the unit is GB) -``` +## Step 10: Edit variables in the `inventory.ini` file + +This step describes how to edit the variable of deployment directory and other variables in the `inventory.ini` file. + +### Configure the deployment directory + +Edit the `deploy_dir` variable to configure the deployment directory. + +The global variable is set to `/home/tidb/deploy` by default, and it applies to all services. If the data disk is mounted on the `/data1` directory, you can set it to `/data1/deploy`. For example: + +```bash ## Global variables [all:vars] deploy_dir = /data1/deploy ``` -To set a deployment directory separately for a service, you can configure host variables when configuring the service host list. Take the TiKV node as an example and it is similar for other services. You must add the first column alias to avoid confusion when the services are mixedly deployed. +**Note:** To separately set the deployment directory for a service, you can configure the host variable while configuring the service host list in the `inventory.ini` file. It is required to add the first column alias, to avoid confusion in scenarios of mixed services deployment. -``` +```bash TiKV1-1 ansible_host=172.16.10.4 deploy_dir=/data1/deploy ``` -#### Description of other variables +### Edit other variables (Optional) -> **Note:** To enable the following control variables, use the capitalized `True`. To disable the following control variables, use the capitalized `False`. +To enable the following control variables, use the capitalized `True`. To disable the following control variables, use the capitalized `False`. -| Variable | Description | +| Variable Name | Description | | ---- | ------- | | cluster_name | the name of a cluster, adjustable | | tidb_version | the version of TiDB, configured by default in TiDB-Ansible branches | -| deployment_method | the method of deployment, binary by default, Docker optional | | process_supervision | the supervision way of processes, systemd by default, supervise optional | -| timezone | the timezone of the managed node, adjustable, `Asia/Shanghai` by default, used with the `set_timezone` variable | -| set_timezone | to edit the timezone of the managed node, True by default; False means closing | -| enable_elk | currently not supported | -| enable_firewalld | to enable the firewall, closed by default | +| timezone | the global default time zone configured when a new TiDB cluster bootstrap is initialized; you can edit it later using the global `time_zone` system variable and the session `time_zone` system variable as described in [Time Zone Support](../sql/time-zone.md); the default value is `Asia/Shanghai` and see [the list of time zones](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones) for more optional values | +| enable_firewalld | to enable the firewall, closed by default; to enable it, add the ports in [network requirements](../op-guide/recommendation.md#network-requirements) to the white list | | enable_ntpd | to monitor the NTP service of the managed node, True by default; do not close it | -| set_hostname | to edit the hostname of the mananged node based on the IP, False by default | +| set_hostname | to edit the hostname of the managed node based on the IP, False by default | | enable_binlog | whether to deploy Pump and enable the binlog, False by default, dependent on the Kafka cluster; see the `zookeeper_addrs` variable | | zookeeper_addrs | the zookeeper address of the binlog Kafka cluster | | enable_slow_query_log | to record the slow query log of TiDB into a single file: ({{ deploy_dir }}/log/tidb_slow_query.log). False by default, to record it into the TiDB log | | deploy_without_tidb | the Key-Value mode, deploy only PD, TiKV and the monitoring service, not TiDB; set the IP of the tidb_servers host group to null in the `inventory.ini` file | | alertmanager_target | optional: If you have deployed `alertmanager` separately, you can configure this variable using the `alertmanager_host:alertmanager_port` format | | grafana_admin_user | the username of Grafana administrator; default `admin` | -| grafana_admin_password | the password of Grafana administrator account; default `admin`; used to import Dashboard and create the API key using Ansible; update this variable after you modify it through Grafana web | +| grafana_admin_password | the password of Grafana administrator account; default `admin`; used to import Dashboard and create the API key using Ansible; update this variable if you have modified it through Grafana web | +| collect_log_recent_hours | to collect the log of recent hours; default the recent 2 hours | +| enable_bandwidth_limit | to set a bandwidth limit when pulling the diagnostic data from the target machines to the Control Machine; used together with the `collect_bandwidth_limit` variable | +| collect_bandwidth_limit | the limited bandwidth when pulling the diagnostic data from the target machines to the Control Machine; unit: Kbit/s; default 10000, indicating 10Mb/s; for the cluster topology of multiple TiKV instances on each TiKV node, you need to divide the number of the TiKV instances on each TiKV node | -## Deploy the TiDB cluster +## Step 11: Deploy the TiDB cluster When `ansible-playbook` runs Playbook, the default concurrent number is 5. If many deployment target machines are deployed, you can add the `-f` parameter to specify the concurrency, such as `ansible-playbook deploy.yml -f 10`. -The following example uses the `tidb` user account as the user who runs the service. - -To deploy TiDB using a normal user account, take the following steps: +The following example uses `tidb` as the user who runs the service. 1. Edit the `tidb-ansible/inventory.ini` file to make sure `ansible_user = tidb`. ``` ## Connection - # ssh via root: - # ansible_user = root - # ansible_become = true - # ansible_become_user = tidb - # ssh via normal user ansible_user = tidb ``` + > **Note:** Do not configure `ansible_user` to `root`, because `tidb-ansible` limits the user that runs the service to the normal user. + Run the following command and if all servers return `tidb`, then the SSH mutual trust is successfully configured: ``` @@ -290,7 +550,7 @@ To deploy TiDB using a normal user account, take the following steps: ansible -i inventory.ini all -m shell -a 'whoami' -b ``` -2. Run the `local_prepare.yml` playbook, connect to the Internet and download TiDB binary to the Control Machine. +2. Run the `local_prepare.yml` playbook and download TiDB binary to the Control Machine. ``` ansible-playbook local_prepare.yml @@ -308,17 +568,21 @@ To deploy TiDB using a normal user account, take the following steps: ansible-playbook deploy.yml ``` + > **Note:** You can use the `Report` button on the Grafana Dashboard to generate the PDF file. This function depends on the `fontconfig` package and English fonts. To use this function, log in to the `grafana_servers` machine and install it using the following command: + > + > ``` + > $ sudo yum install fontconfig open-sans-fonts + > ``` + 5. Start the TiDB cluster. ``` ansible-playbook start.yml ``` -> **Note:** If you want to deploy TiDB using the root user account, see [Ansible Deployment Using the Root User Account](root-ansible-deployment.md). - -## Test the cluster +## Test the TiDB cluster -It is recommended to configure load balancing to provide uniform SQL interface. +Because TiDB is compatible with MySQL, you must use the MySQL client to connect to TiDB directly. It is recommended to configure load balancing to provide uniform SQL interface. 1. Connect to the TiDB cluster using the MySQL client. @@ -334,93 +598,11 @@ It is recommended to configure load balancing to provide uniform SQL interface. http://172.16.10.1:3000 ``` - The default account and password: `admin`/`admin`. - -## Perform rolling update - -- The rolling update of the TiDB service does not impact the ongoing business. Minimum requirements: `pd*3, tidb*2, tikv*3`. -- **If the `pump`/`drainer` services are running in the cluster, stop the `drainer` service before rolling update. The rolling update of the TiDB service automatically updates the `pump` service.** - -### Download the binary automatically - -1. Edit the value of the `tidb_version` parameter in `inventory.ini`, and specify the version number you need to update to. The following example specifies the version number as `v1.0.2`: - - ``` - tidb_version = v1.0.2 - ``` - -2. Delete the existing downloads directory `tidb-ansible/downloads/`. - - ``` - rm -rf downloads - ``` - -3. Use `playbook` to download the TiDB 1.0 binary and replace the existing binary in `tidb-ansible/resource/bin/` automatically. - - ``` - ansible-playbook local_prepare.yml - ``` - -### Download the binary manually - -You can also download the binary manually. Use `wget` to download the binary and replace the existing binary in `tidb-ansible/resource/bin/` manually. - -``` -wget http://download.pingcap.org/tidb-v1.0.0-linux-amd64-unportable.tar.gz -``` - -> **Note:** Remember to replace the version number in the download link. - -### Use Ansible for rolling update - -- Apply rolling update to the TiKV node (only update the TiKV service). - - ``` - ansible-playbook rolling_update.yml --tags=tikv - ``` - -- Apply rolling update to the PD node (only update single PD service). - - ``` - ansible-playbook rolling_update.yml --tags=pd - ``` - -- Apply rolling update to the TiDB node (only update single TiDB service). + > **Note**: The default account and password: `admin`/`admin`. - ``` - ansible-playbook rolling_update.yml --tags=tidb - ``` - -- Apply rolling update to all services. - - ``` - ansible-playbook rolling_update.yml - ``` - -## Summary of common operations - -| Job | Playbook | -|:----------------------------------|:-----------------------------------------| -| Start the cluster | `ansible-playbook start.yml` | -| Stop the cluster | `ansible-playbook stop.yml` | -| Destroy the cluster | `ansible-playbook unsafe_cleanup.yml` (If the deployment directory is a mount point, an error will be reported, but implementation results will remain unaffected) | -| Clean data (for test) | `ansible-playbook unsafe_cleanup_data.yml` | -| Rolling Upgrade | `ansible-playbook rolling_update.yml` | -| Rolling upgrade TiKV | `ansible-playbook rolling_update.yml --tags=tikv` | -| Rolling upgrade modules except PD | `ansible-playbook rolling_update.yml --skip-tags=pd` | -| Rolling update the monitoring components | `ansible-playbook rolling_update_monitor.yml` | - -## FAQ +## Deployment FAQs -### How to download and install TiDB of a specified version? - -If you need to install the TiDB 1.0.4 version, download the `TiDB-Ansible release-1.0` branch and make sure `tidb_version = v1.0.4` in the `inventory.ini` file. For installation procedures, see the above description in this document. - -Download the `TiDB-Ansible release-1.0` branch from GitHub: - -``` -git clone -b release-1.0 https://github.com/pingcap/tidb-ansible.git -``` +This section lists the common questions about deploying TiDB using Ansible. ### How to customize the port? @@ -436,12 +618,15 @@ Edit the `inventory.ini` file and add the following host variable after the IP o | Pump | pump_port | 8250 | the pump communication port | | Prometheus | prometheus_port | 9090 | the communication port for the Prometheus service | | Pushgateway | pushgateway_port | 9091 | the aggregation and report port for TiDB, TiKV, and PD monitor | -| node_exporter | node_exporter_port | 9100 | the communication port to report the system information of every TiDB cluster node | +| Node_exporter | node_exporter_port | 9100 | the communication port to report the system information of every TiDB cluster node | | Grafana | grafana_port | 3000 | the port for the external Web monitoring service and client (Browser) access | | Grafana | grafana_collector_port | 8686 | the grafana_collector communication port, used to export Dashboard as the PDF format | +| Kafka_exporter | kafka_exporter_port | 9308 | the communication port for Kafka_exporter, used to monitor the binlog Kafka cluster | ### How to customize the deployment directory? +Edit the `inventory.ini` file and add the following host variable after the IP of the corresponding service: + | Component | Variable Directory | Default Directory | Description | |:--------------|:----------------------|:------------------------------|:-----| | Global | deploy_dir | /home/tidb/deploy | the deployment directory | @@ -456,219 +641,83 @@ Edit the `inventory.ini` file and add the following host variable after the IP o | Pump | pump_data_dir | {{ deploy_dir }}/data.pump | the Pump data directory | | Prometheus | prometheus_log_dir | {{ deploy_dir }}/log | the Prometheus log directory | | Prometheus | prometheus_data_dir | {{ deploy_dir }}/data.metrics | the Prometheus data directory | -| pushgateway | pushgateway_log_dir | {{ deploy_dir }}/log | the pushgateway log directory | -| node_exporter | node_exporter_log_dir | {{ deploy_dir }}/log | the node_exporter log directory | +| Pushgateway | pushgateway_log_dir | {{ deploy_dir }}/log | the pushgateway log directory | +| Node_exporter | node_exporter_log_dir | {{ deploy_dir }}/log | the node_exporter log directory | | Grafana | grafana_log_dir | {{ deploy_dir }}/log | the Grafana log directory | | Grafana | grafana_data_dir | {{ deploy_dir }}/data.grafana | the Grafana data directory | ### How to check whether the NTP service is normal? -Run the following command. If it returns `running`, then the NTP service is running: - -``` -$ sudo systemctl status ntpd.service -● ntpd.service - Network Time Service - Loaded: loaded (/usr/lib/systemd/system/ntpd.service; disabled; vendor preset: disabled) - Active: active (running) since 一 2017-12-18 13:13:19 CST; 3s ago -``` - -Run the ntpstat command. If it returns `synchronised to NTP server` (synchronizing with the NTP server), then the synchronization process is normal. - -``` -$ ntpstat -synchronised to NTP server (85.199.214.101) at stratum 2 - time correct to within 91 ms - polling server every 1024 s -``` - -> **Note:** For the Ubuntu system, install the `ntpstat` package. - -The following condition indicates the NTP service is not synchronized normally: - -``` -$ ntpstat -unsynchronised -``` - -The following condition indicates the NTP service is not running normally: - -``` -$ ntpstat -Unable to talk to NTP daemon. Is it running? -``` - -Running the following command can promote the starting of the NTP service synchronization. You can replace `pool.ntp.org` with other NTP server. - -``` -$ sudo systemctl stop ntpd.service -$ sudo ntpdate pool.ntp.org -$ sudo systemctl start ntpd.service -``` - -### How to deploy the NTP service using Ansible? - -Refer to [Download TiDB-Ansible to the Control Machine](#download-tidb-ansible-to-the-control-machine) and download TiDB-Ansible. Add the IP of the deployment target machine to `[servers]`. You can replace the `ntp_server` variable value `pool.ntp.org` with other NTP server. Before starting the NTP service, the system `ntpdate` the NTP server. The NTP service deployed by Ansible uses the default server list in the package. See the `server` parameter in the `cat /etc/ntp.conf` file. - -``` -$ vi hosts.ini -[servers] -172.16.10.49 -172.16.10.50 -172.16.10.61 -172.16.10.62 - -[all:vars] -username = tidb -ntp_server = pool.ntp.org -``` - -Run the following command, and enter the root password of the deployment target machine as prompted: - -``` -$ ansible-playbook -i hosts.ini deploy_ntp.yml -k -``` - -### How to install the NTP service manually? - -Run the following command on the CentOS 7 system: - -``` -$ sudo yum install ntp ntpdate -$ sudo systemctl start ntpd.service -``` - -### How to deploy TiDB using Docker? - -- Install Docker on the Control Machine and the managed node. The normal user (such as `ansible_user = tidb`) account in `inventory.ini` must have the sudo privileges or [running Docker privileges](https://docs.docker.com/engine/installation/linux/linux-postinstall/). -- Install the `docker-py` module on the Control Machine and the managed node. +1. Run the following command. If it returns `running`, then the NTP service is running: ``` - sudo pip install docker-py + $ sudo systemctl status ntpd.service + ntpd.service - Network Time Service + Loaded: loaded (/usr/lib/systemd/system/ntpd.service; disabled; vendor preset: disabled) + Active: active (running) since 一 2017-12-18 13:13:19 CST; 3s ago ``` -- Edit the `inventory.ini` file: +2. Run the ntpstat command. If it returns `synchronised to NTP server` (synchronizing with the NTP server), then the synchronization process is normal. ``` - # deployment methods, [binary, docker] - deployment_method = docker - - # process supervision, [systemd, supervise] - process_supervision = systemd + $ ntpstat + synchronised to NTP server (85.199.214.101) at stratum 2 + time correct to within 91 ms + polling server every 1024 s ``` -The Docker installation process is similar to the binary method. +> **Note:** For the Ubuntu system, you need to install the `ntpstat` package. -### How to adjust the supervision method of a process from supervise to systemd? +- The following condition indicates the NTP service is not synchronizing normally: -``` -# process supervision, [systemd, supervise] -process_supervision = systemd -``` - -For versions earlier than TiDB 1.0.4, the TiDB-Ansible supervision method of a process is supervise by default. The previously installed cluster can remain the same. If you need to change the supervision method to systemd, close the cluster and run the following command: - -``` -ansible-playbook stop.yml -ansible-playbook deploy.yml -D -ansible-playbook start.yml -``` - -#### How to install Ansible? - -- For the CentOS system, install Ansible following the method described at the beginning of this document. -- For the Ubuntu system, install Ansible as follows: - - ```bash - $ sudo apt-get install python-pip curl - $ cd tidb-ansible - $ sudo pip install -r ./requirements.txt + ``` + $ ntpstat + unsynchronised ``` -### Mount the data disk ext4 filesystem with options - -Format your data disks to ext4 filesystem and mount the filesystem with the `nodelalloc` and `noatime` options. It is required to mount the `nodelalloc` option, or else the Ansible deployment cannot pass the detection. The `noatime` option is optional. - -Take the `/dev/nvme0n1` data disk as an example: - -1. Edit the `/etc/fstab` file and add the `nodelalloc` mount option: +- The following condition indicates the NTP service is not running normally: ``` - # vi /etc/fstab - /dev/nvme0n1 /data1 ext4 defaults,nodelalloc,noatime 0 2 + $ ntpstat + Unable to talk to NTP daemon. Is it running? ``` -2. Umount the mount directory and remount using the following command: +- To make the NTP service start synchronizing as soon as possible, run the following command. You can replace `pool.ntp.org` with other NTP servers. ``` - # umount /data1 - # mount -a + $ sudo systemctl stop ntpd.service + $ sudo ntpdate pool.ntp.org + $ sudo systemctl start ntpd.service ``` -3. Check using the following command: +- To install the NTP service manually on the CentOS 7 system, run the following command: ``` - # mount -t ext4 - /dev/nvme0n1 on /data1 type ext4 (rw,noatime,nodelalloc,data=ordered) + $ sudo yum install ntp ntpdate + $ sudo systemctl start ntpd.service + $ sudo systemctl enable ntpd.service ``` -### How to configure SSH mutual trust and sudo without password? - -#### Create the `tidb` user on the Control Machine and generate the SSH key. - -``` -# useradd tidb -# passwd tidb -# su - tidb -$ -$ ssh-keygen -t rsa -Generating public/private rsa key pair. -Enter file in which to save the key (/home/tidb/.ssh/id_rsa): -Created directory '/home/tidb/.ssh'. -Enter passphrase (empty for no passphrase): -Enter same passphrase again: -Your identification has been saved in /home/tidb/.ssh/id_rsa. -Your public key has been saved in /home/tidb/.ssh/id_rsa.pub. -The key fingerprint is: -SHA256:eIBykszR1KyECA/h0d7PRKz4fhAeli7IrVphhte7/So tidb@172.16.10.49 -The key's randomart image is: -+---[RSA 2048]----+ -|=+o+.o. | -|o=o+o.oo | -| .O.=.= | -| . B.B + | -|o B * B S | -| * + * + | -| o + . | -| o E+ . | -|o ..+o. | -+----[SHA256]-----+ -``` - -#### How to automatically configure SSH mutual trust and sudo without password using Ansible? +### How to modify the supervision method of a process from `supervise` to `systemd`? -Refer to [Download TiDB-Ansible to the Control Machine](#download-tidb-ansible-to-the-control-machine) and download TiDB-Ansible. Add the IP of the deployment target machine to `[servers]`. +Run the following command: ``` -$ vi hosts.ini -[servers] -172.16.10.49 -172.16.10.50 -172.16.10.61 -172.16.10.62 - -[all:vars] -username = tidb +# process supervision, [systemd, supervise] +process_supervision = systemd ``` -Run the following command, and enter the `root` password of the deployment target machine as prompted: +For versions earlier than TiDB 1.0.4, the TiDB-Ansible supervision method of a process is `supervise` by default. The previously installed cluster can remain the same. If you need to change the supervision method to `systemd`, stop the cluster and run the following command: ``` -$ ansible-playbook -i hosts.ini create_users.yml -k +ansible-playbook stop.yml +ansible-playbook deploy.yml -D +ansible-playbook start.yml ``` -#### How to manually configure SSH mutual trust and sudo without password? +### How to manually configure the SSH mutual trust and sudo without password? -Use the `root` user to login to the deployment target machine, create the `tidb` user and set the login password. +Log in to the deployment target machine using the `root` user account, create the `tidb` user and set the login password. ``` # useradd tidb @@ -682,15 +731,13 @@ To configure sudo without password, run the following command, and add `tidb ALL tidb ALL=(ALL) NOPASSWD: ALL ``` -Use the `tidb` user to login to the Control Machine, and run the following command. Replace `172.16.10.61` with the IP of your deployment target machine, and enter the `tidb` user password of the deployment target machine. Successful execution indicates that SSH mutual trust is already created. This applies to other machines as well. +Use the `tidb` user to log in to the Control Machine, and run the following command. Replace `172.16.10.61` with the IP of your deployment target machine, and enter the `tidb` user password of the deployment target machine as prompted. Successful execution indicates that SSH mutual trust is already created. This applies to other machines as well. ``` [tidb@172.16.10.49 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub 172.16.10.61 ``` -#### Authenticate SSH mutual trust and sudo without password - -Use the `tidb` user to login to the Control Machine, and login to the IP of the target machine using SSH. If you do not need to enter the password and can successfully login, then SSH mutual trust is successfully configured. +Log in to the Control Machine using the `tidb` user account, and log in to the IP of the target machine using SSH. If you do not need to enter the password and can successfully log in, then the SSH mutual trust is successfully configured. ``` [tidb@172.16.10.49 ~]$ ssh 172.16.10.61 @@ -706,18 +753,7 @@ After you login to the deployment target machine using the `tidb` user, run the ### Error: You need to install jmespath prior to running json_query filter -See [Install Ansible and dependencies in the Control Machine](#install-ansible-and-dependencies-in-the-control-machine) and use `pip` to install Ansible and the related specific dependencies in the Control Machine. The `jmespath` dependent package is installed by default. - -For the CentOS 7 system, you can install `jmespath` separately using the following command: - -``` -$ sudo yum -y install epel-release -$ sudo yum -y install python-pip -$ sudo pip install jmespath -$ pip show jmespath -Name: jmespath -Version: 0.9.0 -``` +See [Install Ansible and its dependencies on the Control Machine](#step-4-install-ansible-and-its-dependencies-on-the-control-machine) and use `pip` to install Ansible and the related specific dependencies in the Control Machine. The `jmespath` dependent package is installed by default. Enter `import jmespath` in the Python interactive window of the Control Machine. @@ -732,13 +768,6 @@ Type "help", "copyright", "credits" or "license" for more information. >>> import jmespath ``` -For the Ubuntu system, you can install `jmespath` separately using the following command: - -``` -$ sudo apt-get install python-pip -$ sudo pip install jmespath -``` - ### The `zk: node does not exist` error when starting Pump/Drainer Check whether the `zookeeper_addrs` configuration in `inventory.ini` is the same with the configuration in the Kafka cluster, and whether the namespace is filled in. The description about namespace configuration is as follows: @@ -749,4 +778,4 @@ Check whether the `zookeeper_addrs` configuration in `inventory.ini` is the same # zookeeper_addrs = "192.168.0.11:2181,192.168.0.12:2181,192.168.0.13:2181" # You can also append an optional chroot string to the URLs to specify the root directory for all Kafka znodes. Example: # zookeeper_addrs = "192.168.0.11:2181,192.168.0.12:2181,192.168.0.13:2181/kafka/123" -``` \ No newline at end of file +``` diff --git a/op-guide/ansible-operation.md b/op-guide/ansible-operation.md new file mode 100644 index 0000000000000..8ecc8af11380f --- /dev/null +++ b/op-guide/ansible-operation.md @@ -0,0 +1,43 @@ +--- +title: TiDB-Ansible Common Operations +summary: Learn some common operations when using TiDB-Ansible to administer a TiDB cluster. +category: operations +--- + +# TiDB-Ansible Common Operations + +This guide describes the common operations when you administer a TiDB cluster using TiDB-Ansible. + +## Start a cluster + +```bash +$ ansible-playbook start.yml +``` + +This operation starts all the components in the entire TiDB cluster in order, which include PD, TiDB, TiKV, and the monitoring components. + +## Stop a cluster + +```bash +$ ansible-playbook stop.yml +``` + +This operation stops all the components in the entire TiDB cluster in order, which include PD, TiDB, TiKV, and the monitoring components. + +## Clean up cluster data + +``` +$ ansible-playbook unsafe_cleanup_data.yml +``` + +This operation stops the TiDB, Pump, TiKV and PD services, and cleans up the data directory of Pump, TiKV and PD. + +## Destroy a cluster + +``` +$ ansible-playbook unsafe_cleanup.yml +``` + +This operation stops the cluster and cleans up the data directory. + +> **Note:** If the deployment directory is a mount point, an error will be reported, but implementation results remain unaffected, so you can ignore it. \ No newline at end of file diff --git a/op-guide/backup-restore.md b/op-guide/backup-restore.md index cdb87a09d52db..c1fa6709446a1 100644 --- a/op-guide/backup-restore.md +++ b/op-guide/backup-restore.md @@ -1,5 +1,6 @@ --- title: Backup and Restore +summary: Learn how to back up and restore the data of TiDB. category: operations --- @@ -7,7 +8,7 @@ category: operations ## About -This document describes how to backup and restore the data of TiDB. Currently, this document only covers full backup and restoration. +This document describes how to back up and restore the data of TiDB. Currently, this document only covers full backup and restoration. Here we assume that the TiDB service information is as follows: @@ -37,9 +38,9 @@ cd tidb-enterprise-tools-latest-linux-amd64 ## Full backup and restoration using `mydumper`/`loader` -You can use `mydumper` to export data from MySQL and `loader` to import the data into TiDB. +You can use [`mydumper`](../tools/mydumper.md) to export data from MySQL and [`loader`](../tools/loader.md) to import the data into TiDB. -> **Note**: Although TiDB also supports the official `mysqldump` tool from MySQL for data migration, it is not recommended to use it. Its performance is much lower than `mydumper`/`loader` and it takes much time to migrate large amounts of data. `mydumper`/`loader` is more powerful. For more information, see https://github.com/maxbube/mydumper. +> **Important**: You must use the `mydumper` from the Enterprise Tools package, and not the `mydumper` provided by your operating system's package manager. The upstream version of `mydumper` does not yet handle TiDB correctly ([#155](https://github.com/maxbube/mydumper/pull/155)). Using `mysqldump` is also not recommended, as it is much slower for both backup and restoration. ### Best practices of full backup and restoration using `mydumper`/`loader` @@ -118,4 +119,4 @@ mysql> select * from t2; | 2 | b | | 3 | c | +----+------+ -``` \ No newline at end of file +``` diff --git a/op-guide/binary-deployment.md b/op-guide/binary-deployment.md deleted file mode 100644 index 26578277e8799..0000000000000 --- a/op-guide/binary-deployment.md +++ /dev/null @@ -1,447 +0,0 @@ ---- -title: Deploy TiDB Using the Binary -category: operations ---- - -# Deploy TiDB Using the Binary - -## Overview - -A complete TiDB cluster contains PD, TiKV, and TiDB. To start the database service, follow the order of PD -> TiKV -> TiDB. To stop the database service, follow the order of stopping TiDB -> TiKV -> PD. - -Before you start, see [TiDB architecture](../overview.md#tidb-architecture) and [Software and Hardware Requirements](recommendation.md). - -This document describes the binary deployment of three scenarios: - -- To quickly understand and try TiDB, see [Single node cluster deployment](#single-node-cluster-deployment). -- To try TiDB out and explore the features, see [Multiple nodes cluster deployment for test](#multiple-nodes-cluster-deployment-for-test). -- To deploy and use TiDB in production, see [Multiple nodes cluster deployment](#multiple-nodes-cluster-deployment). - -## TiDB components and default ports - -### TiDB database components (required) - -See the following table for the default ports for the TiDB components: - -| Component | Default Port | Protocol | Description | -| :-- | :-- | :-- | :----------- | -| ssh | 22 | TCP | sshd service | -| TiDB| 4000 | TCP | the communication port for the application and DBA tools | -| TiDB| 10080 | TCP | the communication port to report TiDB status | -| TiKV| 20160 | TCP | the TiKV communication port | -| PD | 2379 | TCP | the communication port between TiDB and PD | -| PD | 2380 | TCP | the inter-node communication port within the PD cluster | - -### TiDB database components (optional) - -See the following table for the default ports for the optional TiDB components: - -| Component | Default Port | Protocol | Description | -| :-- | :-- | :-- | :------------------------ | -| Prometheus | 9090| TCP | the communication port for the Prometheus service | -| Pushgateway | 9091 | TCP | the aggregation and report port for TiDB, TiKV, and PD monitor | -| Node_exporter| 9100| TCP | the communication port to report the system information of every TiDB cluster node | -| Grafana | 3000 | TCP | the port for the external Web monitoring service and client (Browser) access | -| alertmanager | 9093 | TCP | the port for the alert service | - -## Configure and check the system before installation - -### Operating system - -| Configuration | Description | -| :-- | :-------------------- | -| Supported Platform | See the [Software and Hardware Requirements](./recommendation.md) | -| File System | The ext4 file system is recommended in TiDB Deployment | -| Swap Space | The Swap Space is recommended to close in TiDB Deployment | -| Disk Block Size | Set the size of the system disk `Block` to `4096` | - -### Network and firewall - -| Configuration | Description | -| :-- | :------------------- | -| Firewall / Port | Check whether the ports required by TiDB are accessible between the nodes | - -### Operating system parameters - -| Configuration | Description | -| :-- | :-------------------------- | -| Nice Limits | For system users, set the default value of `nice` in TiDB to `0` | -| min_free_kbytes | The setting for `vm.min_free_kbytes` in `sysctl.conf` needs to be high enough | -| User Open Files Limit | For database administrators, set the number of TiDB open files to `1000000` | -| System Open File Limits | Set the number of system open files to `1000000` | -| User Process Limits | For TiDB users, set the `nproc` value to `4096` in `limits.conf` | -| Address Space Limits | For TiDB users, set the space to `unlimited` in `limits.conf` | -| File Size Limits | For TiDB users, set the `fsize` value to `unlimited` in `limits.conf` | -| Disk Readahead | Set the value of the `readahead` data disk to `4096` at a minimum | -| NTP service | Configure the NTP time synchronization service for each node | -| SELinux | Turn off the SELinux service for each node | -| CPU Frequency Scaling | It is recommended to turn on CPU overclocking | -| Transparent Hugepages | For Red Hat 7+ and CentOS 7+ systems, it is required to set the Transparent Hugepages to `always` | -| I/O Scheduler | Set the I/O Scheduler of data disks to the `deadline` mode | -| vm.swappiness | Set `vm.swappiness = 0` | - - -> **Note**: To adjust the operating system parameters, contact your system administrator. - -### Database running user - -| Configuration | Description | -| :-- | :---------------------------- | -| LANG environment | Set `LANG = en_US.UTF8` | -| TZ time zone | Set the TZ time zone of all nodes to the same value | - -## Create the database running user account - -In the Linux environment, create TiDB on each installation node as a database running user, and set up the SSH mutual trust between cluster nodes. To create a running user and open SSH mutual trust, contact the system administrator. Here is an example: - -```bash -# useradd tidb -# usermod -a -G tidb tidb -# su - tidb -Last login: Tue Aug 22 12:06:23 CST 2017 on pts/2 --bash-4.2$ ssh-keygen -t rsa -Generating public/private rsa key pair. -Enter file in which to save the key (/home/tidb/.ssh/id_rsa): -Created directory '/home/tidb/.ssh'. -Enter passphrase (empty for no passphrase): -Enter same passphrase again: -Your identification has been saved in /home/tidb/.ssh/id_rsa. -Your public key has been saved in /home/tidb/.ssh/id_rsa.pub. -The key fingerprint is: -5a:00:e6:df:9e:40:25:2c:2d:e2:6e:ee:74:c6:c3:c1 tidb@t001 -The key's randomart image is: -+--[ RSA 2048]----+ -| oo. . | -| .oo.oo | -| . ..oo | -| .. o o | -| . E o S | -| oo . = . | -| o. * . o | -| ..o . | -| .. | -+-----------------+ - --bash-4.2$ cd .ssh --bash-4.2$ cat id_rsa.pub >> authorized_keys --bash-4.2$ chmod 644 authorized_keys --bash-4.2$ ssh-copy-id -i ~/.ssh/id_rsa.pub 192.168.1.100 -``` - -## Download the official binary package - -TiDB provides the official binary installation package that supports Linux. For the operating system, it is recommended to use Redhat 7.3+, CentOS 7.3+ and higher versions. - -### Operating system: Linux (Redhat 7+, CentOS 7+) - -``` -# Download the package. -wget http://download.pingcap.org/tidb-latest-linux-amd64.tar.gz -wget http://download.pingcap.org/tidb-latest-linux-amd64.sha256 - -# Check the file integrity. If the result is OK, the file is correct. -sha256sum -c tidb-latest-linux-amd64.sha256 - -# Extract the package. -tar -xzf tidb-latest-linux-amd64.tar.gz -cd tidb-latest-linux-amd64 -``` - -## Single node cluster deployment - -After downloading the TiDB binary package, you can run and test the TiDB cluster on a standalone server. Follow the steps below to start PD, TiKV and TiDB: - -1. Start PD. - - ```bash - ./bin/pd-server --data-dir=pd \ - --log-file=pd.log - ``` - - -2. Start TiKV. - - ```bash - ./bin/tikv-server --pd="127.0.0.1:2379" \ - --data-dir=tikv \ - --log-file=tikv.log - ``` - -3. Start TiDB. - - ```bash - ./bin/tidb-server --store=tikv \ - --path="127.0.0.1:2379" \ - --log-file=tidb.log - ``` - -4. Use the official MySQL client to connect to TiDB. - - ```sh - mysql -h 127.0.0.1 -P 4000 -u root -D test - ``` - -## Multiple nodes cluster deployment for test - -If you want to test TiDB but have a limited number of nodes, you can use one PD instance to test the entire cluster. - -Assuming that you have four nodes, you can deploy 1 PD instance, 3 TiKV instances, and 1 TiDB instance. See the following table for details: - -| Name | Host IP | Services | -| :-- | :-- | :------------------- | -| Node1 | 192.168.199.113 | PD1, TiDB | -| Node2 | 192.168.199.114 | TiKV1 | -| Node3 | 192.168.199.115 | TiKV2 | -| Node4 | 192.168.199.116 | TiKV3 | - -Follow the steps below to start PD, TiKV and TiDB: - -1. Start PD on Node1. - - ```bash - ./bin/pd-server --name=pd1 \ - --data-dir=pd1 \ - --client-urls="http://192.168.199.113:2379" \ - --peer-urls="http://192.168.199.113:2380" \ - --initial-cluster="pd1=http://192.168.199.113:2380" \ - --log-file=pd.log - ``` - -2. Start TiKV on Node2, Node3 and Node4. - - ```bash - ./bin/tikv-server --pd="192.168.199.113:2379" \ - --addr="192.168.199.114:20160" \ - --data-dir=tikv1 \ - --log-file=tikv.log - - ./bin/tikv-server --pd="192.168.199.113:2379" \ - --addr="192.168.199.115:20160" \ - --data-dir=tikv2 \ - --log-file=tikv.log - - ./bin/tikv-server --pd="192.168.199.113:2379" \ - --addr="192.168.199.116:20160" \ - --data-dir=tikv3 \ - --log-file=tikv.log - ``` - -3. Start TiDB on Node1. - - ```bash - ./bin/tidb-server --store=tikv \ - --path="192.168.199.113:2379" \ - --log-file=tidb.log - ``` - -4. Use the official MySQL client to connect to TiDB. - - ```sh - mysql -h 192.168.199.113 -P 4000 -u root -D test - ``` - -## Multiple nodes cluster deployment - -For the production environment, multiple nodes cluster deployment is recommended. Before you begin, see [Software and Hardware Requirements](./recommendation.md). - -Assuming that you have six nodes, you can deploy 3 PD instances, 3 TiKV instances, and 1 TiDB instance. See the following table for details: - -| Name | Host IP | Services | -| :-- | :-- | :-------------- | -| Node1 | 192.168.199.113| PD1, TiDB | -| Node2 | 192.168.199.114| PD2 | -| Node3 | 192.168.199.115| PD3 | -| Node4 | 192.168.199.116| TiKV1 | -| Node5 | 192.168.199.117| TiKV2 | -| Node6 | 192.168.199.118| TiKV3 | - -Follow the steps below to start PD, TiKV, and TiDB: - -1. Start PD on Node1, Node2, and Node3 in sequence. - - ```bash - ./bin/pd-server --name=pd1 \ - --data-dir=pd1 \ - --client-urls="http://192.168.199.113:2379" \ - --peer-urls="http://192.168.199.113:2380" \ - --initial-cluster="pd1=http://192.168.199.113:2380,pd2=http://192.168.199.114:2380,pd3=http://192.168.199.115:2380" \ - -L "info" \ - --log-file=pd.log - - ./bin/pd-server --name=pd2 \ - --data-dir=pd2 \ - --client-urls="http://192.168.199.114:2379" \ - --peer-urls="http://192.168.199.114:2380" \ - --initial-cluster="pd1=http://192.168.199.113:2380,pd2=http://192.168.199.114:2380,pd3=http://192.168.199.115:2380" \ - --join="http://192.168.199.113:2379" \ - -L "info" \ - --log-file=pd.log - - ./bin/pd-server --name=pd3 \ - --data-dir=pd3 \ - --client-urls="http://192.168.199.115:2379" \ - --peer-urls="http://192.168.199.115:2380" \ - --initial-cluster="pd1=http://192.168.199.113:2380,pd2=http://192.168.199.114:2380,pd3=http://192.168.199.115:2380" \ - --join="http://192.168.199.113:2379" \ - -L "info" \ - --log-file=pd.log - ``` - -2. Start TiKV on Node4, Node5 and Node6. - - ```bash - ./bin/tikv-server --pd="192.168.199.113:2379,192.168.199.114:2379,192.168.199.115:2379" \ - --addr="192.168.199.116:20160" \ - --data-dir=tikv1 \ - --log-file=tikv.log - - ./bin/tikv-server --pd="192.168.199.113:2379,192.168.199.114:2379,192.168.199.115:2379" \ - --addr="192.168.199.117:20160" \ - --data-dir=tikv2 \ - --log-file=tikv.log - - ./bin/tikv-server --pd="192.168.199.113:2379,192.168.199.114:2379,192.168.199.115:2379" \ - --addr="192.168.199.118:20160" \ - --data-dir=tikv3 \ - --log-file=tikv.log - ``` - -3. Start TiDB on Node1. - - ```bash - ./bin/tidb-server --store=tikv \ - --path="192.168.199.113:2379,192.168.199.114:2379,192.168.199.115:2379" \ - --log-file=tidb.log - ``` - -4. Use the official MySQL client to connect to TiDB. - - ```sh - mysql -h 192.168.199.113 -P 4000 -u root -D test - ``` - -> **Note**: -> -> - If you start TiKV or deploy PD in the production environment, it is highly recommended to specify the path for the configuration file using the `--config` parameter. If the parameter is not set, TiKV or PD does not read the configuration file. -> - To tune TiKV, see [Performance Tuning for TiKV](./tune-TiKV.md). -> - If you use `nohup` to start the cluster in the production environment, write the startup commands in a script and then run the script. If not, the `nohup` process might abort because it receives exceptions when the Shell command exits. For more information, see [The TiDB/TiKV/PD process aborts unexpectedly](../trouble-shooting.md#the-tidbtikvpd-process-aborts-unexpectedly). - -## TiDB monitor and alarm deployment - -To install and deploy the environment for TiDB monitor and alarm service, see the following table for the system information: - -| Name | Host IP | Services | -| :-- | :-- | :------------- | -| Node1 | 192.168.199.113 | node_export, pushgateway, Prometheus, Grafana | -| Node2 | 192.168.199.114 | node_export | -| Node3 | 192.168.199.115 | node_export | -| Node4 | 192.168.199.116 | node_export | - -### Download the binary package - -``` -# Download the package. -wget https://github.com/prometheus/prometheus/releases/download/v1.5.2/prometheus-1.5.2.linux-amd64.tar.gz -wget https://github.com/prometheus/node_exporter/releases/download/v0.14.0-rc.2/node_exporter-0.14.0-rc.2.linux-amd64.tar.gz -wget https://grafanarel.s3.amazonaws.com/builds/grafana-4.1.2-1486989747.linux-x64.tar.gz -wget https://github.com/prometheus/pushgateway/releases/download/v0.3.1/pushgateway-0.3.1.linux-amd64.tar.gz - -# Extract the package. -tar -xzf prometheus-1.5.2.linux-amd64.tar.gz -tar -xzf node_exporter-0.14.0-rc.2.linux-amd64.tar.gz -tar -xzf grafana-4.1.2-1486989747.linux-x64.tar.gz -tar -xzf pushgateway-0.3.1.linux-amd64.tar.gz -``` - -### Start the monitor service - -#### Start `node_exporter` on Node1, Node2, Node3 and Node4. - -``` -$cd node_exporter-0.14.0-rc.1.linux-amd64 - -# Start the node_exporter service. -./node_exporter --web.listen-address=":9100" \ - --log.level="info" -``` - -#### Start `pushgateway` on Node1. - -``` -$cd pushgateway-0.3.1.linux-amd64 - -# Start the pushgateway service. -./pushgateway \ - --log.level="info" \ - --web.listen-address=":9091" -``` - -#### Start Prometheus in Node1. - -``` -$cd prometheus-1.5.2.linux-amd64 - -# Edit the Configuration file: - -vi prometheus.yml - -... -global: - scrape_interval: 15s # By default, scrape targets every 15 seconds. - evaluation_interval: 15s # By default, scrape targets every 15 seconds. - # scrape_timeout is set to the global default (10s). - labels: - cluster: 'test-cluster' - monitor: "prometheus" - -scrape_configs: - - job_name: 'overwritten-cluster' - scrape_interval: 3s - honor_labels: true # don't overwrite job & instance labels - static_configs: - - targets: ['192.168.199.113:9091'] - - - job_name: "overwritten-nodes" - honor_labels: true # don't overwrite job & instance labels - static_configs: - - targets: - - '192.168.199.113:9100' - - '192.168.199.114:9100' - - '192.168.199.115:9100' - - '192.168.199.116:9100' -... - -# Start Prometheus: -./prometheus \ - --config.file="/data1/tidb/deploy/conf/prometheus.yml" \ - --web.listen-address=":9090" \ - --web.external-url="http://192.168.199.113:9090/" \ - --log.level="info" \ - --storage.local.path="/data1/tidb/deploy/data.metrics" \ - --storage.local.retention="360h0m0s" -``` - -#### Start Grafana in Node1. - -``` -cd grafana-4.1.2-1486989747.linux-x64 - -# Edit the Configuration file: - -vi grafana.ini - -... - -# The http port to use -http_port = 3000 - -# The public facing domain name used to access grafana from a browser -domain = 192.168.199.113 - -... - -# Start the Grafana service: -./bin/grafana-server \ - --homepath="/data1/tidb/deploy/opt/grafana" \ - --config="/data1/tidb/deploy/opt/grafana/conf/grafana.ini" -``` diff --git a/op-guide/configuration.md b/op-guide/configuration.md index 5cccf6654c67d..8ff23fa2c808f 100644 --- a/op-guide/configuration.md +++ b/op-guide/configuration.md @@ -1,5 +1,6 @@ --- title: Configuration Flags +summary: Learn some configuration flags of TiDB, TiKV and PD. category: operations --- @@ -11,6 +12,12 @@ TiDB, TiKV and PD are configurable using command-line flags and environment vari The default TiDB ports are 4000 for client requests and 10080 for status report. +### `--advertise-address` + +- The IP address on which to advertise the apiserver to the TiDB server +- Default: "" +- This address must be reachable by the rest of the TiDB cluster and the user. + ### `--binlog-socket` - The TiDB services use the unix socket file for internal connections, such as the Pump service @@ -21,7 +28,7 @@ The default TiDB ports are 4000 for client requests and 10080 for status report. - The configuration file - Default: "" -- If you have specified the configuration file, TiDB reads the configuration file. If the corresponding configuration also exists in the command line flags, TiDB uses the configuration in the command line flags to overwrite that in the configuration file. For detailed configuration information, see [TiDB Configuration File Description](tidb-config-file.md) +- If you have specified the configuration file, TiDB reads the configuration file. If the corresponding configuration also exists in the command line flags, TiDB uses the configuration in the command line flags to overwrite that in the configuration file. For detailed configuration information, see [TiDB Configuration File Description](../op-guide/tidb-config-file.md) ### `--host` @@ -155,7 +162,13 @@ release tokens. - The listening URL list for client traffic - Default: "http://127.0.0.1:2379" -- To deploy a cluster, you must use `--client-urls` to specify the IP address of the current host, such as "http://192.168.100.113:2379". If the cluster is run on Docker, specify the IP address of Docker as "http://0.0.0.0:2379". +- To deploy a cluster, you must use `--client-urls` to specify the IP address of the current host, such as "http://192.168.100.113:2379". If the cluster runs on Docker, specify the IP address of Docker as "http://0.0.0.0:2379". + +### `--peer-urls` + +- The listening URL list for peer traffic +- Default: "http://127.0.0.1:2380" +- To deploy a cluster, you must use `--peer-urls` to specify the IP address of the current host, such as "http://192.168.100.113:2380". If the cluster runs on Docker, specify the IP address of Docker as "http://0.0.0.0:2380". ### `--config` @@ -209,11 +222,26 @@ release tokens. - Default: "pd" - If you want to start multiply PDs, you must use different name for each one. -### `--peer-urls` +### `--cacert` -- The listening URL list for peer traffic -- Default: "http://127.0.0.1:2380" -- To deploy a cluster, you must use `--peer-urls` to specify the IP address of the current host, such as "http://192.168.100.113:2380". If the cluster is run on Docker, specify the IP address of Docker as "http://0.0.0.0:2380". +- The file path of CA, used to enable TLS +- Default: "" + +### `--cert` + +- The path of the PEM file including the X509 certificate, used to enable TLS +- Default: "" + +### `--key` + +- The path of the PEM file including the X509 key, used to enable TLS +- Default: "" + +### `--namespace-classifier` + +- To specify the namespace classifier used by PD +- Default: "table" +- If you use TiKV separately, not in the entire TiDB cluster, it is recommended to configure the value to 'default'. ## TiKV diff --git a/op-guide/dashboard-overview-info.md b/op-guide/dashboard-overview-info.md index 090a5f8938c63..18b8a3ad1f498 100644 --- a/op-guide/dashboard-overview-info.md +++ b/op-guide/dashboard-overview-info.md @@ -1,41 +1,77 @@ --- title: Key Metrics +summary: Learn some key metrics displayed on the Grafana Overview dashboard. category: operations --- # Key Metrics -If you use Ansible to deploy TiDB cluster, you can deploy the monitoring system at the same time. See [Overview of the Monitoring Framework](monitor-overview.md) for more information. +If you use Ansible to deploy the TiDB cluster, the monitoring system is deployed at the same time. For more information, see [Overview of the Monitoring Framework](../op-guide/monitor-overview.md). -The Grafana dashboard is divided into four sub dashboards: node_export, PD, TiKV, and TiDB. There are a lot of metics there to help you diagnose. For routine operations, some of the key metrics are displayed on the Overview dashboard so that you can get the overview of the status of the components and the entire cluster. See the following section for their descriptions: +The Grafana dashboard is divided into a series of sub dashboards which include Overview, PD, TiDB, TiKV, Node\_exporter, Disk Performance, and so on. A lot of metrics are there to help you diagnose. + +For routine operations, you can get an overview of the component (PD, TiDB, TiKV) status and the entire cluster from the Overview dashboard, where the key metrics are displayed. This document provides a detailed description of these key metrics. ## Key metrics description +To understand the key metrics displayed on the Overview dashboard, check the following table: + Service | Panel Name | Description | Normal Range ---- | ---------------- | ---------------------------------- | -------------- +Services Port Status | Services Online | the online nodes number of each service | +Services Port Status | Services Offline | the offline nodes number of each service | PD | Storage Capacity | the total storage capacity of the TiDB cluster | PD | Current Storage Size | the occupied storage capacity of the TiDB cluster | -PD | Store Status -- up store | the number of TiKV nodes that are up | -PD | Store Status -- down store | the number of TiKV nodes that are down | `0`. If the number is bigger than `0`, it means some node(s) are not down. -PD | Store Status -- offline store | the number of TiKV nodes that are manually offline| -PD | Store Status -- Tombstone store | the number of TiKV nodes that are Tombstone| -PD | Current storage usage | the storage occupancy rate of the TiKV cluster | If it exceeds 80%, you need to consider adding more TiKV nodes. -PD | 99% completed cmds duration seconds | the 99th percentile duration to complete a pd-server request| less than 5ms -PD | average completed cmds duration seconds | the average duration to complete a pd-server request | less than 50ms -PD | leader balance ratio | the leader ratio difference of the nodes with the biggest leader ratio and the smallest leader ratio | It is less than 5% for a balanced situation. It becomes bigger when a node is restarting. -PD | region balance ratio | the region ratio difference of the nodes with the biggest region ratio and the smallest region ratio | It is less than 5% for a balanced situation. It becomes bigger when adding or removing a node. -TiDB | handle requests duration seconds | the response time to get TSO from PD| less than 100ms -TiDB | tidb server QPS | the QPS of the cluster | application specific -TiDB | connection count | the number of connections from application servers to the database | Application specific. If the number of connections hops, you need to find out the reasons. If it drops to 0, you can check if the network is broken; if it surges, you need to check the application. -TiDB | statement count | the number of different types of statement within a given time | application specific -TiDB | Query Duration 99th percentile | the 99th percentile query time | -TiKV | 99% & 99.99% scheduler command duration | the 99th percentile and 99.99th percentile scheduler command duration| For 99%, it is less than 50ms; for 99.99%, it is less than 100ms. -TiKV | 95% & 99.99% storage async_request duration | the 95th percentile and 99.99th percentile Raft command duration | For 95%, it is less than 50ms; for 99.99%, it is less than 100ms. -TiKV | server report failure message | There might be an issue with the network or the message might not come from this cluster. | If there are large amount of messages which contains `unreachable`, there might be an issue with the network. If the message contains `store not match`, the message does not come from this cluster. -TiKV | Vote |the frequency of the Raft vote | Usually, the value only changes when there is a split. If the value of Vote remains high for a long time, the system might have a severe issue and some nodes are not working. -TiKV | 95% and 99% coprocessor request duration | the 95th percentile and the 99th percentile coprocessor request duration | Application specific. Usually, the value does not remain high. -TiKV | Pending task | the number of pending tasks | Except for PD worker, it is not normal if the value is too high. -TiKV | stall | RocksDB stall time | If the value is bigger than 0, it means that RocksDB is too busy, and you need to pay attention to IO and CPU usage. -TiKV | channel full | The channel is full and the threads are too busy. | If the value is bigger than 0, the threads are too busy. -TiKV | 95% send message duration seconds | the 95th percentile message sending time | less than 50ms -TiKV | leader/region | the number of leader/region per TiKV server| application specific \ No newline at end of file +PD | Number of Regions | the total number of Regions of the current cluster | +PD | Leader Balance Ratio | the leader ratio difference of the nodes with the biggest leader ratio and the smallest leader ratio | It is less than 5% for a balanced situation and becomes bigger when you restart a node. +PD | Region Balance Ratio | the region ratio difference of the nodes with the biggest Region ratio and the smallest Region ratio | It is less than 5% for a balanced situation and becomes bigger when you add or remove a node. +PD | Store Status -- Up Stores | the number of TiKV nodes that are up | +PD | Store Status -- Disconnect Stores | the number of TiKV nodes that encounter abnormal communication within a short time | +PD | Store Status -- LowSpace Stores | the number of TiKV nodes with an available space of less than 80% | +PD | Store Status -- Down Stores | the number of TiKV nodes that are down | The normal value is `0`. If the number is bigger than `0`, it means some node(s) are abnormal. +PD | Store Status -- Offline Stores | the number of TiKV nodes (still providing service) that are being made offline | +PD | Store Status -- Tombstone Stores | the number of TiKV nodes that are successfully offline | +PD | 99% completed_cmds_duration_seconds | the 99th percentile duration to complete a pd-server request | less than 5ms +PD | handle_requests_duration_seconds | the request duration of a PD request | +TiDB | Statement OPS | the total number of executed SQL statements, including `SELECT`, `INSERT`, `UPDATE` and so on | +TiDB | Duration | the execution time of a SQL statement | +TiDB | QPS By Instance | the QPS on each TiDB instance | +TiDB | Failed Query OPM | the number of failed SQL statements, including syntax error and key conflicts and so on | +TiDB | Connection Count | the connection number of each TiDB instance | +TiDB | Heap Memory Usage | the size of heap memory used by each TiDB instance | +TiDB | Transaction OPS | the number of executed transactions per second | +TiDB | Transaction Duration | the execution time of a transaction | +TiDB | KV Cmd OPS | the number of executed KV commands | +TiDB | KV Cmd Duration 99 | the execution time of the KV command | +TiDB | PD TSO OPS | the number of TSO that TiDB obtains from PD | +TiDB | PD TSO Wait Duration | the time consumed when TiDB obtains TSO from PD | +TiDB | TiClient Region Error OPS | the number of Region related errors returned by TiKV | +TiDB | Lock Resolve OPS | the number of transaction related conflicts | +TiDB | Load Schema Duration | the time consumed when TiDB obtains Schema from TiKV | +TiDB | KV Backoff OPS | the number of errors returned by TiKV (such as transaction conflicts ) +TiKV | leader | the number of leaders on each TiKV node | +TiKV | region | the number of Regions on each TiKV node | +TiKV | CPU | the CPU usage ratio on each TiKV node | +TiKV | Memory | the memory usage on each TiKV node | +TiKV | store size | the data amount on each TiKV node | +TiKV | cf size | the data amount on different CFs in the cluster | +TiKV | channel full | `No data points` is displayed in normal conditions. If a monitoring value displays, it means the corresponding TiKV node fails to handle the messages | +TiKV | server report failures | `No data points` is displayed in normal conditions. If `Unreachable` is displayed, it means TiKV encounters a communication issue. | +TiKV | scheduler pending commands | the number of commits on queue | Occasional value peaks are normal. +TiKV | coprocessor pending requests | the number of requests on queue | `0` or very small +TiKV | coprocessor executor count | the number of various query operations | +TiKV | coprocessor request duration | the time consumed by TiKV queries | +TiKV | raft store CPU | the CPU usage ratio of the raftstore thread | Currently, it is a single thread. A value of over 80% indicates that the CPU usage ratio is very high. +TiKV | Coprocessor CPU | the CPU usage ratio of the TiKV query thread, related to the application; complex queries consume a great deal of CPU | +System Info | Vcores | the number of CPU cores | +System Info | Memory | the total memory | +System Info | CPU Usage | the CPU usage ratio, 100% at a maximum | +System Info | Load [1m] | the overload within 1 minute | +System Info | Memory Available | the size of the available memory | +System Info | Network Traffic | the statistics of the network traffic | +System Info | TCP Retrans | the statistics about network monitoring and TCP | +System Info | IO Util | the disk usage ratio, 100% at a maximum; generally you need to consider adding a new node when the usage ratio is up to 80% ~ 90% | + +## Interface of the Overview dashboard + +![Overview Dashboard](../media/overview.png) \ No newline at end of file diff --git a/op-guide/docker-compose.md b/op-guide/docker-compose.md index 77d442c12ef92..70f99c2aa7d8c 100644 --- a/op-guide/docker-compose.md +++ b/op-guide/docker-compose.md @@ -1,15 +1,22 @@ --- title: TiDB Docker Compose Deployment +summary: Use Docker Compose to quickly deploy a TiDB testing cluster. category: operations --- # TiDB Docker Compose Deployment -This document describes how to quickly deploy TiDB using [Docker Compose](https://docs.docker.com/compose/overview). +This document describes how to quickly deploy a TiDB testing cluster with a single command using [Docker Compose](https://docs.docker.com/compose/overview). With Docker Compose, you can use a YAML file to configure application services in multiple containers. Then, with a single command, you can create and start all the services from your configuration. -You can use Docker Compose to deploy a TiDB test cluster with a single command. It is required to use Docker 17.06.0 or later. +## Prerequisites + +Make sure you have installed the following items on your machine: + +- [Git](https://git-scm.com/downloads) +- [Docker Compose](https://docs.docker.com/compose/install/) +- [MySQL Client](https://dev.mysql.com/downloads/mysql/) ## Deploy TiDB using Docker Compose @@ -19,25 +26,30 @@ You can use Docker Compose to deploy a TiDB test cluster with a single command. git clone https://github.com/pingcap/tidb-docker-compose.git ``` -2. Create and start the cluster. +2. Change the directory to tidb-docker-compose and get the latest TiDB Docker Images: ```bash - cd tidb-docker-compose && docker-compose up -d + cd tidb-docker-compose && docker-compose pull ``` -3. Access the cluster. +3. Start the TiDB cluster: ```bash + docker-compose up -d + ``` + +4. Use the MySQL client to connect to TiDB to read and write data: + + ``` mysql -h 127.0.0.1 -P 4000 -u root ``` - Access the Grafana monitoring interface: +## Monitor the cluster - - Default address: - - Default account name: admin - - Default password: admin +After you have successfully deployed a TiDB cluster, you can now monitor the TiDB cluster using one of the following methods: - Access the [cluster data visualization interface](https://github.com/pingcap/tidb-vision): +- Use Grafana to view the status of the cluster via [http://localhost:3000](http://localhost:3000) with the default account name and password: `admin` and `admin`. +- Use [TiDB-Vision](https://github.com/pingcap/tidb-vision), a cluster visualization tool, to see data transfer and load-balancing inside your cluster via [http://localhost:8010](http://localhost:8010). ## Customize the cluster @@ -59,7 +71,7 @@ To customize the cluster, you can edit the `docker-compose.yml` file directly. I For macOS, you can also install Helm using the following command in Homebrew: - ``` + ```bash brew install kubernetes-helm ``` @@ -73,29 +85,28 @@ To customize the cluster, you can edit the `docker-compose.yml` file directly. I ```bash cd tidb-docker-compose - cp compose/values.yaml values.yaml - vim values.yaml + vim compose/values.yaml # custom the cluster size, docker image, port mapping and so on ``` - Modify the configuration in `values.yaml`, such as the cluster size, TiDB image version, and so on. + You can modify the configuration in `values.yaml`, such as the cluster size, TiDB image version, and so on. - [tidb-vision](https://github.com/pingcap/tidb-vision) is the data visualization interface of the TiDB cluster, used to visually display the PD scheduling on TiKV data. If you do not need this component, leave `tidbVision` empty. + [tidb-vision](https://github.com/pingcap/tidb-vision) is the data visualization interface of the TiDB cluster, used to visually display the PD scheduling on TiKV data. If you do not need this component, comment out the `tidbVision` section. For PD, TiKV, TiDB and tidb-vision, you can build Docker images from GitHub source code or local files for development and testing. - - To build the image of a component from GitHub source code, you need to leave the `image` field empty and set `buildFrom` to `remote`. - - To build PD, TiKV or TiDB images from the locally compiled binary file, you need to leave the `image` field empty, set `buildFrom` to `local` and copy the compiled binary file to the corresponding `pd/bin/pd-server`, `tikv/bin/tikv-server`, `tidb/bin/tidb-server`. - - To build the tidb-vision image from local, you need to leave the `image` field empty, set `buildFrom` to `local` and copy the tidb-vision project to `tidb-vision/tidb-vision`. + - To build PD, TiKV or TiDB images from the locally compiled binary file, you need to comment out the `image` field and copy the compiled binary file to the corresponding `pd/bin/pd-server`, `tikv/bin/tikv-server`, `tidb/bin/tidb-server`. + - To build the tidb-vision image from local, you need to comment out the `image` field and copy the tidb-vision project to `tidb-vision/tidb-vision`. 4. Generate the `docker-compose.yml` file. ```bash - helm template -f values.yaml compose > generated-docker-compose.yml + helm template compose > generated-docker-compose.yml ``` 5. Create and start the cluster using the generated `docker-compose.yml` file. ```bash + docker-compose -f generated-docker-compose.yml pull # Get the latest Docker images docker-compose -f generated-docker-compose.yml up -d ``` @@ -111,4 +122,57 @@ To customize the cluster, you can edit the `docker-compose.yml` file directly. I - Default account name: admin - Default password: admin - If tidb-vision is enabled, you can access the cluster data visualization interface: . \ No newline at end of file + If tidb-vision is enabled, you can access the cluster data visualization interface: . + +## Access the Spark shell and load TiSpark + +Insert some sample data to the TiDB cluster: + +```bash +$ docker-compose exec tispark-master bash +$ cd /opt/spark/data/tispark-sample-data +$ mysql -h tidb -P 4000 -u root < dss.ddl +``` + +After the sample data is loaded into the TiDB cluster, you can access the Spark shell using `docker-compose exec tispark-master /opt/spark/bin/spark-shell`. + +```bash +$ docker-compose exec tispark-master /opt/spark/bin/spark-shell +... +Spark context available as 'sc' (master = local[*], app id = local-1527045927617). +Spark session available as 'spark'. +Welcome to + ____ __ + / __/__ ___ _____/ /__ + _\ \/ _ \/ _ `/ __/ '_/ + /___/ .__/\_,_/_/ /_/\_\ version 2.1.1 + /_/ + +Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_172) +Type in expressions to have them evaluated. +Type :help for more information. + +scala> import org.apache.spark.sql.TiContext +... +scala> val ti = new TiContext(spark) +... +scala> ti.tidbMapDatabase("TPCH_001") +... +scala> spark.sql("select count(*) from lineitem").show ++--------+ +|count(1)| ++--------+ +| 60175| ++--------+ +``` + +You can also access Spark with Python or R using the following commands: + +``` +docker-compose exec tispark-master /opt/spark/bin/pyspark +docker-compose exec tispark-master /opt/spark/bin/sparkR +``` + +For more details about TiSpark, see [here](../tispark/tispark-quick-start-guide.md). + +Here is [a 5-minute tutorial](https://www.pingcap.com/blog/how_to_spin_up_an_htap_database_in_5_minutes_with_tidb_tispark/) for macOS users that shows how to spin up a standard TiDB cluster using Docker Compose on your local computer. \ No newline at end of file diff --git a/op-guide/docker-deployment.md b/op-guide/docker-deployment.md index 41f54583d1e2a..1eec3fc2745d0 100644 --- a/op-guide/docker-deployment.md +++ b/op-guide/docker-deployment.md @@ -1,13 +1,14 @@ --- -title: TiDB Docker Deployment +title: Deploy TiDB Using Docker +summary: Use Docker to manually deploy a multi-node TiDB cluster on multiple machines. category: operations --- -# Docker Deployment +# Deploy TiDB Using Docker This page shows you how to manually deploy a multi-node TiDB cluster on multiple machines using Docker. -To learn more, see [TiDB architecture](../overview.md#tidb-architecture) and [Software and Hardware Requirements](recommendation.md). +To learn more, see [TiDB architecture](../overview.md#tidb-architecture) and [Software and Hardware Requirements](../op-guide/recommendation.md). ## Preparation diff --git a/op-guide/gc.md b/op-guide/gc.md new file mode 100644 index 0000000000000..2eb2793890747 --- /dev/null +++ b/op-guide/gc.md @@ -0,0 +1,86 @@ +--- +title: TiDB Garbage Collection (GC) +summary: Use Garbage Collection (GC) to clear the obsolete data of TiDB. +category: advanced +--- + +# TiDB Garbage Collection (GC) + +TiDB uses MVCC to control concurrency. When you update or delete data, the original data is not deleted immediately but is kept for a period during which it can be read. Thus the write operation and the read operation are not mutually exclusive and it is possible to read the history versions of the data. + +The data versions whose duration exceeds a specific time and that are not used any more will be cleared, otherwise they will occupy the disk space and affect TiDB's performance. TiDB uses Garbage Collection (GC) to clear the obsolete data. + +## Working mechanism + +GC runs periodically on TiDB. When a TiDB server is started, a `gc_worker` is enabled in the background. In each TiDB cluster, one `gc_worker` is elected to be the leader which is used to maintain the GC status and send GC commands to all the TiKV Region leaders. + +## Configuration and monitor + +The GC configuration and operational status are recorded in the `mysql.tidb` system table as below, which can be monitored and configured using SQL statements: + +```sql +mysql> select VARIABLE_NAME, VARIABLE_VALUE from mysql.tidb; ++-----------------------+------------------------------------------------------------------------------------------------+ +| VARIABLE_NAME | VARIABLE_VALUE | ++-----------------------+------------------------------------------------------------------------------------------------+ +| bootstrapped | True | +| tidb_server_version | 18 | +| tikv_gc_leader_uuid | 58accebfa7c0004 | +| tikv_gc_leader_desc | host:ip-172-16-30-5, pid:95472, start at 2018-04-11 13:43:30.73076656 +0800 CST m=+0.068873865 | +| tikv_gc_leader_lease | 20180418-11:02:30 +0800 CST | +| tikv_gc_run_interval | 10m0s | +| tikv_gc_life_time | 10m0s | +| tikv_gc_last_run_time | 20180418-10:59:30 +0800 CST | +| tikv_gc_safe_point | 20180418-10:58:30 +0800 CST | +| tikv_gc_concurrency | 1 | ++-----------------------+------------------------------------------------------------------------------------------------+ +10 rows in set (0.02 sec) +``` + +In the table above, `tikv_gc_run_interval`, `tikv_gc_life_time` and `tikv_gc_concurrency` can be configured manually. Other variables with the `tikv_gc` prefix record the current status, which are automatically updated by TiDB. Do not modify these variables. + +- `tikv_gc_leader_uuid`, `tikv_gc_leader_desc`, `tikv_gc_leader_lease`: the current GC leader information. + +- `tikv_gc_run_interval`: the interval of GC work. The value is 10 min by default and cannot be smaller than 10 min. + +- `tikv_gc_life_time`: the retention period of data versions; The value is 10 min by default and cannot be smaller than 10 min. + + When GC works, the outdated data is cleared. You can set it using the SQL statement. For example, if you want to retain the data within a day, you can execute the operation as below: + + ```sql + update mysql.tidb set VARIABLE_VALUE = '24h' where VARIABLE_NAME = 'tikv_gc_life_time'; + ``` + + The duration strings are a sequence of a number with the time unit, such as 24h, 2h30m and 2.5h. The time units you can use include "h", "m" and "s". + + > **Note**: When you set `tikv_gc_life_time` to a large number (like days or even months) in a scenario where data is updated frequently, some problems as follows may occur: + + - The more versions of the data, the more disk storage space is occupied. + - A large number of history versions might slow down the query. They may affect range queries like `select count(*) from t`. + - If `tikv_gc_life_time` is suddenly turned to a smaller value during operation, a great deal of old data may be deleted in a short time, causing I/O pressure. + +- `tikv_gc_last_run_time`: the last time GC works. + +- `tikv_gc_safe_point`: the time before which versions are cleared by GC and after which versions are readable. + +- `tikv_gc_concurrency`: the GC concurrency. It is set to 1 by default. In this case, a single thread operates and threads send request to each Region and wait for the response one by one. You can set the variable value larger to improve the system performance, but keep the value smaller than 128. + +## Implementation details + +The GC implementation process is complex. When the obsolete data is cleared, data consistency is guaranteed. The process of doing GC is as below: + +### 1. Resolve locks + +The TiDB transaction model is inspired by Google's Percolator. It's mainly a two-phase commit protocol with some practical optimizations. When the first phase is finished, all the related keys are locked. Among these locks, one is the primary lock and the others are secondary locks which contain a pointer of the primary locks; in the secondary phase, the key with the primary lock gets a write record and its lock is removed. The write record indicates the write or delete operation in the history or the transactional rollback record of this key. Replacing the primary lock with which write record indicates whether the corresponding transaction is committed successfully. Then all the secondary locks are replaced successively. If the threads fail to replace the secondary locks, these locks are retained. During GC, the lock whose timestamp is before the safe point is replaced with the corresponding write record based on the transaction committing status. + +> **Note**: This is a required step. Once GC has cleared the write record of the primary lock, you can never know whether this transaction is successful or not. As a result, data consistency cannot be guaranteed. + +### 2. Delete ranges + +`DeleteRanges` is usually executed after operations like `drop table`, used to delete a range which might be very large. If the `use_delete_range` option of TiKV is not enabled, TiKV deletes the keys in the range. + +### 3. Do GC + +Clear the data before the safe point of each key and the write record. + +> **Note**: If the last record in all the write records of `Put` and `Delete` types before the safe point is `Put`, this record and its data cannot be deleted directly. Otherwise, you cannot successfully perform the read operation whose timestamp is after the safe point and before the next version of the key. \ No newline at end of file diff --git a/op-guide/generate-self-signed-certificates.md b/op-guide/generate-self-signed-certificates.md index d6f662e626c54..f10fc612943c8 100644 --- a/op-guide/generate-self-signed-certificates.md +++ b/op-guide/generate-self-signed-certificates.md @@ -1,5 +1,6 @@ --- title: Generate Self-signed Certificates +summary: Use `cfssl` to generate self-signed certificates. category: deployment --- diff --git a/op-guide/history-read.md b/op-guide/history-read.md index 95002ace05c92..b3562dfc6745e 100644 --- a/op-guide/history-read.md +++ b/op-guide/history-read.md @@ -1,5 +1,6 @@ --- title: Reading Data from History Versions +summary: Learn about how TiDB reads data from history versions. category: advanced --- @@ -9,7 +10,8 @@ This document describes how TiDB reads data from the history versions, how TiDB ## Feature description -TiDB implements a feature to read history data using the standard SQL interface directly without special clients or drivers. By using this feature, +TiDB implements a feature to read history data using the standard SQL interface directly without special clients or drivers. By using this feature: + - Even when data is updated or removed, its history versions can be read using the SQL interface. - Even if the table structure changes after the data is updated, TiDB can use the old structure to read the history data. @@ -20,7 +22,7 @@ The `tidb_snapshot` system variable is introduced to support reading history dat - The variable is valid in the `Session` scope. - Its value can be modified using the `Set` statement. - The data type for the variable is text. -- The variable is to record time in the following format: “2016-10-08 16:45:26.999”. Generally, the time can be set to seconds like in “2016-10-08 16:45:26”. +- The variable accepts TSO (Timestamp Oracle) and datetime. TSO is a globally unique time service, which is obtained from PD. The acceptable datetime format is "2016-10-08 16:45:26.999". Generally, the datetime can be set using second precision, for example "2016-10-08 16:45:26". - When the variable is set, TiDB creates a Snapshot using its value as the timestamp, just for the data structure and there is no any overhead. After that, all the `Select` operations will read data from this Snapshot. > **Note:** Because the timestamp in TiDB transactions is allocated by Placement Driver (PD), the version of the stored data is also marked based on the timestamp allocated by PD. When a Snapshot is created, the version number is based on the value of the `tidb_snapshot` variable. If there is a large difference between the local time of the TiDB server and the PD server, use the time of the PD server. @@ -29,39 +31,14 @@ After reading data from history versions, you can read data from the latest vers ## How TiDB manages the data versions -TiDB implements Multi-Version Concurrency Control (MVCC) to manage data versions. The history versions of data are kept because each update / removal creates a new version of the data object instead of updating / removing the data object in-place. But not all the versions are kept. If the versions are older than a specific time, they will be removed completely to reduce the storage occupancy and the performance overhead caused by too many history versions. - -In TiDB, Garbage Collection (GC) runs periodically to remove the obsolete data versions. GC is triggered in the following way: There is a `gc_worker` goroutine running in the background of each TiDB server. In a cluster with multiple TiDB servers, one of the `gc_worker` goroutines will be automatically selected to be the leader. The leader is responsible for maintaining the GC state and sends GC commands to each TiKV region leader. - -The running record of GC is recorded in the system table of `mysql.tidb` as follows and can be monitored and configured using the SQL statements: - -``` -mysql> select variable_name, variable_value from mysql.tidb; -+-----------------------+----------------------------+ -| variable_name | variable_value | -+-----------------------+----------------------------+ -| bootstrapped | True | -| tikv_gc_leader_uuid | 55daa0dfc9c0006 | -| tikv_gc_leader_desc | host:pingcap-pc5 pid:10549 | -| tikv_gc_leader_lease | 20160927-13:18:28 +0800 CST| -| tikv_gc_run_interval | 10m0s | -| tikv_gc_life_time | 10m0s | -| tikv_gc_last_run_time | 20160927-13:13:28 +0800 CST| -| tikv_gc_safe_point | 20160927-13:03:28 +0800 CST| -+-----------------------+----------------------------+ -7 rows in set (0.00 sec) -``` +TiDB implements Multi-Version Concurrency Control (MVCC) to manage data versions. The history versions of data are kept because each update/removal creates a new version of the data object instead of updating/removing the data object in-place. But not all the versions are kept. If the versions are older than a specific time, they will be removed completely to reduce the storage occupancy and the performance overhead caused by too many history versions. -Pay special attention to the following two rows: +In TiDB, Garbage Collection (GC) runs periodically to remove the obsolete data versions. For GC details, see [TiDB Garbage Collection (GC)](../op-guide/gc.md) -- `tikv_gc_life_time`: This row is to configure the retention time of the history version and its default value is 10m. You can use SQL statements to configure it. For example, if you want all the data within one day to be readable, set this row to 24h by using the `update mysql.tidb set variable_value='24h' where variable_name='tikv_gc_life_time'` statement. The format is: "24h", "2h30m", "2.5h". The unit of time can be: "h", "m", "s". +Pay special attention to the following two variables: -> **Note:** If your data is updated very frequently, the following issues might occur if the value of the `tikv_gc_life_time` is set to be too large like in days or months: -> -> - The more versions of the data, the more disk storage is occupied. -> - A large amount of the history versions might slow down the query, especially the range queries like `select count(*) from t`. -> - If the value of the `tikv_gc_life_time` variable is suddenly changed to be smaller while the database is running, it might lead to the removal of large amounts of history data and cause huge I/O burden. -> - `tikv_gc_safe_point`: This row records the current safePoint. You can safely create the Snapshot to read the history data using the timestamp that is later than the safePoint. The safePoint automatically updates every time GC runs. +- `tikv_gc_life_time`: It is used to configure the retention time of the history version. You can modify it manually. +- `tikv_gc_safe_point`: It records the current `safePoint`. You can safely create the snapshot to read the history data using the timestamp that is later than `safePoint`. `safePoint` automatically updates every time GC runs. ## Example @@ -130,6 +107,9 @@ Pay special attention to the following two rows: mysql> set @@tidb_snapshot="2016-10-08 16:45:26"; Query OK, 0 rows affected (0.00 sec) ``` + + > **Note:** You should use `@@` instead of `@` before `tidb_snapshot` because `@@` is used to denote the system variable while `@` is used to denote the user variable. + **Result:** The read from the following statement is the data before the update operation, which is the history data. ```sql @@ -161,4 +141,6 @@ Pay special attention to the following two rows: | 3 | +------+ 3 rows in set (0.00 sec) - ``` \ No newline at end of file + ``` + + > **Note:** You should use `@@` instead of `@` before `tidb_snapshot` because `@@` is used to denote the system variable while `@` is used to denote the user variable. diff --git a/op-guide/horizontal-scale.md b/op-guide/horizontal-scale.md index 28ceae77e763a..4bc3467104e36 100644 --- a/op-guide/horizontal-scale.md +++ b/op-guide/horizontal-scale.md @@ -1,5 +1,6 @@ --- title: Scale a TiDB cluster +summary: Learn how to add or delete PD, TiKV and TiDB nodes. category: operations --- @@ -9,9 +10,11 @@ category: operations The capacity of a TiDB cluster can be increased or reduced without affecting online services. +> **Note:** If your TiDB cluster is deployed using Ansible, see [Scale the TiDB Cluster Using TiDB-Ansible](../op-guide/ansible-deployment-scale.md). + The following part shows you how to add or delete PD, TiKV or TiDB nodes. -About pd-ctl usage, please refer to [PD Control User Guide](../tools/pd-control.md). +About `pd-ctl` usage, refer to [PD Control User Guide](../tools/pd-control.md). ## PD diff --git a/op-guide/kubernetes.md b/op-guide/kubernetes.md new file mode 100644 index 0000000000000..a7117ccf242e9 --- /dev/null +++ b/op-guide/kubernetes.md @@ -0,0 +1,24 @@ +--- +title: TiDB Deployment on Kubernetes +summary: Use TiDB Operator to quickly deploy a TiDB cluster on Kubernetes +category: operations +--- + +# TiDB Deployment on Kubernetes + +[TiDB Operator](https://github.com/pingcap/tidb-operator) manages TiDB clusters on [Kubernetes](https://kubernetes.io) +and automates tasks related to operating a TiDB cluster. It makes TiDB a truly cloud-native database. + +> **Warning:** Currently, TiDB Operator is work in progress [WIP] and is NOT ready for production. Use at your own risk. + +## Google Kubernetes Engine (GKE) + +The TiDB Operator tutorial for GKE runs directly in the Google Cloud Shell. + +[![Open in Cloud Shell](https://gstatic.com/cloudssh/images/open-btn.png)](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/pingcap/tidb-operator&tutorial=docs/google-kubernetes-tutorial.md) + +## Local installation using Docker in Docker + +Docker in Docker (DinD) runs Docker containers as virtual machines and runs another layer of Docker containers inside the first layer of Docker containers. `kubeadm-dind-cluster` uses this technology to run the Kubernetes cluster in Docker containers. TiDB Operator uses a modified DinD script to manage the DinD Kubernetes cluster. + +[Continue reading tutorial on GitHub →](https://github.com/pingcap/tidb-operator/blob/master/docs/local-dind-tutorial.md) diff --git a/op-guide/location-awareness.md b/op-guide/location-awareness.md index 2752f4d8f0ba9..71bd3bf41b92b 100644 --- a/op-guide/location-awareness.md +++ b/op-guide/location-awareness.md @@ -1,5 +1,6 @@ --- title: Cross-Region Deployment +summary: Learn the cross-region deployment that maximizes the capacity for disaster recovery. category: operations --- @@ -9,7 +10,7 @@ category: operations PD schedules according to the topology of the TiKV cluster to maximize the TiKV's capability for disaster recovery. -Before you begin, see [Ansible Deployment (Recommended)](ansible-deployment.md) and [Docker Deployment](docker-deployment.md). +Before you begin, see [Deploy TiDB Using Ansible (Recommended)](../op-guide/ansible-deployment.md) and [Deploy TiDB Using Docker](../op-guide/docker-deployment.md). ## TiKV reports the topological information @@ -44,13 +45,11 @@ location-labels = ["zone", "rack", "host"] ## PD schedules based on the TiKV topology -PD makes optimal schedulings according to the topological information. You just need to care about what kind of topology can achieve the desired effect. +PD makes optimal scheduling according to the topological information. You just need to care about what kind of topology can achieve the desired effect. -If you use 3 replicas and hope that everything still works well when a data zone hangs up, you need at least 4 data zones. -(Theoretically, three data zones are feasible but the current implementation cannot guarantee.) +If you use 3 replicas and hope that the TiDB cluster is always highly available even when a data zone goes down, you need at least 4 data zones. -Assume that we have 4 data zones, each zone has 2 racks and each rack has 2 hosts. -We can start 2 TiKV instances on each host: +Assume that you have 4 data zones, each zone has 2 racks, and each rack has 2 hosts. You can start 2 TiKV instances on each host: ``` # zone=z1 @@ -81,7 +80,8 @@ tikv-server --labels zone=z4,rack=r2,host=h2 In other words, 16 TiKV instances are distributed across 4 data zones, 8 racks and 16 machines. In this case, PD will schedule different replicas of each datum to different data zones. -- If one of the data zones hangs up, everything still works well. + +- If one of the data zones goes down, the high availability of the TiDB cluster is not affected. - If the data zone cannot recover within a period of time, PD will remove the replica from this data zone. To sum up, PD maximizes the disaster recovery of the cluster according to the current topology. Therefore, if you want to reach a certain level of disaster recovery, deploy many machines in different sites according to the topology. The number of machines must be more than the number of `max-replicas`. diff --git a/op-guide/migration-overview.md b/op-guide/migration-overview.md index 6ac961dca7f90..644790f6e26bc 100644 --- a/op-guide/migration-overview.md +++ b/op-guide/migration-overview.md @@ -1,5 +1,6 @@ --- title: Migration Overview +summary: Learn how to migrate data from MySQL to TiDB. category: operations --- @@ -19,12 +20,12 @@ See the following for the assumed MySQL and TiDB server information: ## Scenarios + To import all the history data. This needs the following tools: - - `Checker`: to check if the shema is compatible with TiDB. + - `Checker`: to check if the schema is compatible with TiDB. - `Mydumper`: to export data from MySQL. - `Loader`: to import data to TiDB. -+ To incrementally synchronise data after all the history data is imported. This needs the following tools: - - `Checker`: to check if the shema is compatible with TiDB. ++ To incrementally synchronize data after all the history data is imported. This needs the following tools: + - `Checker`: to check if the schema is compatible with TiDB. - `Mydumper`: to export data from MySQL. - `Loader`: to import data to TiDB. - `Syncer`: to incrementally synchronize data from MySQL to TiDB. @@ -34,6 +35,7 @@ See the following for the assumed MySQL and TiDB server information: ### Enable binary logging (binlog) in MySQL Before using the `syncer` tool, make sure: + + Binlog is enabled in MySQL. See [Setting the Replication Master Configuration](http://dev.mysql.com/doc/refman/5.7/en/replication-howto-masterbaseconfig.html). + Binlog must use the `row` format which is the recommended binlog format in MySQL 5.7. It can be configured using the following statement: diff --git a/op-guide/migration.md b/op-guide/migration.md index 7c6373f66884a..f665ba97a2019 100644 --- a/op-guide/migration.md +++ b/op-guide/migration.md @@ -1,11 +1,12 @@ --- title: Migrate Data from MySQL to TiDB +summary: Use `mydumper`, `loader` and `syncer` tools to migrate data from MySQL to TiDB. category: operations --- # Migrate Data from MySQL to TiDB -## Use the `mydumper` / `loader` tool to export and import all the data +## Use the `mydumper`/`loader` tool to export and import all the data You can use `mydumper` to export data from MySQL and `loader` to import the data into TiDB. @@ -30,7 +31,7 @@ In this command, ### Import data to TiDB -Use `loader` to import the data from MySQL to TiDB. See [Loader instructions](./tools/loader.md) for more information. +Use `loader` to import the data from MySQL to TiDB. See [Loader instructions](../tools/loader.md) for more information. ```bash ./bin/loader -h 127.0.0.1 -u root -P 4000 -t 32 -d ./var/test @@ -115,9 +116,9 @@ tar -xzf tidb-enterprise-tools-latest-linux-amd64.tar.gz cd tidb-enterprise-tools-latest-linux-amd64 ``` -Assuming the data from `t1` and `t2` is already imported to TiDB using `mydumper`/`loader`. Now we hope that any updates to these two tables are synchronised to TiDB in real time. +Assuming the data from `t1` and `t2` is already imported to TiDB using `mydumper`/`loader`. Now we hope that any updates to these two tables are synchronized to TiDB in real time. -### Obtain the position to synchronise +### Obtain the position to synchronize The data exported from MySQL contains a metadata file which includes the position information. Take the following metadata information as an example: ``` @@ -138,7 +139,7 @@ binlog-name = "mysql-bin.000003" binlog-pos = 930143241 ``` -> **Note:** The `syncer.meta` file only needs to be configured once when it is first used. The position will be automatically updated when binlog is synchronised. +> **Note:** The `syncer.meta` file only needs to be configured once when it is first used. The position will be automatically updated when binlog is synchronized. ### Start `syncer` @@ -159,22 +160,22 @@ status-addr = ":10081" skip-sqls = ["ALTER USER", "CREATE USER"] -# Support whitelist filter. You can specify the database and table to be synchronised. For example: -# Synchronise all the tables of db1 and db2: +# Support whitelist filter. You can specify the database and table to be synchronized. For example: +# Synchronize all the tables of db1 and db2: replicate-do-db = ["db1","db2"] -# Synchronise db1.table1. +# Synchronize db1.table1. [[replicate-do-table]] db-name ="db1" tbl-name = "table1" -# Synchronise db3.table2. +# Synchronize db3.table2. [[replicate-do-table]] db-name ="db3" tbl-name = "table2" # Support regular expressions. Start with '~' to use regular expressions. -# To synchronise all the databases that start with `test`: +# To synchronize all the databases that start with `test`: replicate-do-db = ["~^test.*"] # The sharding synchronising rules support wildcharacter. @@ -240,7 +241,7 @@ mysql> select * from t1; +----+------+ ``` -`syncer` outputs the current synchronised data statistics every 30 seconds: +`syncer` outputs the current synchronized data statistics every 30 seconds: ```bash 2017/06/08 01:18:51 syncer.go:934: [info] [syncer]total events = 15, total tps = 130, recent tps = 4, @@ -251,4 +252,4 @@ master-binlog = (ON.000001, 11992), master-binlog-gtid=53ea0ed1-9bf8-11e6-8bea-6 syncer-binlog = (ON.000001, 2504), syncer-binlog-gtid = 53ea0ed1-9bf8-11e6-8bea-64006a897c73:1-35 ``` -You can see that by using `syncer`, the updates in MySQL are automatically synchronised in TiDB. \ No newline at end of file +You can see that by using `syncer`, the updates in MySQL are automatically synchronized in TiDB. \ No newline at end of file diff --git a/op-guide/monitor-overview.md b/op-guide/monitor-overview.md index 92028cf1b3d92..50ba2cff90cfe 100644 --- a/op-guide/monitor-overview.md +++ b/op-guide/monitor-overview.md @@ -1,5 +1,6 @@ --- title: Overview of the TiDB Monitoring Framework +summary: Use Prometheus and Grafana to build the TiDB monitoring framework. category: operations --- @@ -9,7 +10,7 @@ The TiDB monitoring framework adopts two open source projects: Prometheus and Gr ## About Prometheus in TiDB -As a time series database, Prometheus has a multi-dimensional data model and flexible query language. As one of the most popular open source projects, many companies and organizations have adopted Prometheus, and the project has a very active community. PingCAP is one of the active developers and adoptors of Prometheus for monitoring and alerting in TiDB, TiKV and PD. +As a time series database, Prometheus has a multi-dimensional data model and flexible query language. As one of the most popular open source projects, many companies and organizations have adopted Prometheus, and the project has a very active community. PingCAP is one of the active developers and adopters of Prometheus for monitoring and alerting in TiDB, TiKV and PD. Prometheus consists of multiple components. Currently, TiDB uses the following of them: @@ -23,6 +24,6 @@ The diagram is as follows: ## About Grafana in TiDB -Grafana is an open source project for analysing and visualizing metrics. TiDB uses Grafana to display the performance metrics as follows: +Grafana is an open source project for analyzing and visualizing metrics. TiDB uses Grafana to display the performance metrics as follows: ![screenshot](../media/grafana-screenshot.png) diff --git a/op-guide/monitor.md b/op-guide/monitor.md index 317850954d920..e606ced63f9b5 100644 --- a/op-guide/monitor.md +++ b/op-guide/monitor.md @@ -1,5 +1,6 @@ --- title: Monitor a TiDB Cluster +summary: Learn how to monitor the state of a TiDB cluster. category: operations --- @@ -45,7 +46,7 @@ The default port number is: 2379. See [PD API doc](https://cdn.rawgit.com/pingcap/docs/master/op-guide/pd-api-v1.html) for detailed information about various API names. -The interface can be used to get the state of all the TiKV servers and the information about load balancing. It is the most important and frequently-used interface to get the state information of all the TiKV nodes. See the following example for the the information about a single-node TiKV cluster: +The interface can be used to get the state of all the TiKV servers and the information about load balancing. It is the most important and frequently-used interface to get the state information of all the TiKV nodes. See the following example for the information about a single-node TiKV cluster: ```bash curl http://127.0.0.1:2379/pd/api/v1/stores @@ -139,7 +140,7 @@ See the following diagram for the deployment architecture: See the following links for your reference: -- Prometheus Push Gateway: [https://github.com/prometheus/pushgateway](https://github.com/prometheus/pushgateway) +- Prometheus Pushgateway: [https://github.com/prometheus/pushgateway](https://github.com/prometheus/pushgateway) - Prometheus Server: [https://github.com/prometheus/prometheus#install](https://github.com/prometheus/prometheus#install) @@ -151,26 +152,26 @@ See the following links for your reference: + TiDB: Set the two parameters: `--metrics-addr` and `--metrics-interval`. - - Set the Push Gateway address as the `--metrics-addr` parameter. + - Set the Pushgateway address as the `--metrics-addr` parameter. - Set the push frequency as the `--metrics-interval` parameter. The unit is s, and the default value is 15. -+ PD: update the toml configuration file with the Push Gateway address and the the push frequency: ++ PD: update the toml configuration file with the Pushgateway address and the push frequency: ```toml [metric] - # prometheus client push interval, set "0s" to disable prometheus. + # Prometheus client push interval, set "0s" to disable prometheus. interval = "15s" - # prometheus pushgateway address, leaves it empty will disable prometheus. + # Prometheus Pushgateway address, leaves it empty will disable prometheus. address = "host:port" ``` -+ TiKV: update the toml configuration file with the Push Gateway address and the the push frequency. Set the job field as "tikv". ++ TiKV: update the toml configuration file with the Pushgateway address and the the push frequency. Set the job field as "tikv". ```toml [metric] # the Prometheus client push interval. Setting the value to 0s stops Prometheus client from pushing. interval = "15s" - # the Prometheus pushgateway address. Leaving it empty stops Prometheus client from pushing. + # the Prometheus Pushgateway address. Leaving it empty stops Prometheus client from pushing. address = "host:port" # the Prometheus client push job name. Note: A node id will automatically append, e.g., "tikv_1". job = "tikv" @@ -182,7 +183,7 @@ Generally, it does not need to be configured. You can use the default port: 9091 ### Configure Prometheus -Add the Push Gateway address to the yaml configuration file: +Add the Pushgateway address to the yaml configuration file: ```yaml scrape_configs: @@ -195,7 +196,7 @@ Add the Push Gateway address to the yaml configuration file: honor_labels: true static_configs: - - targets: ['host:port'] # use the Push Gateway address + - targets: ['host:port'] # use the Pushgateway address labels: group: 'production' ``` @@ -236,7 +237,7 @@ labels: 2. On the sidebar menu, click "Dashboards" -> "Import" to open the "Import Dashboard" window. -3. Click "Upload .json File" to upload a JSON file ( Download [TiDB Grafana Config](https://grafana.com/tidb) ). +3. Click "Upload .json File" to upload a JSON file (Download [TiDB Grafana Config](https://grafana.com/tidb)). 4. Click "Save & Open". diff --git a/op-guide/offline-ansible-deployment.md b/op-guide/offline-ansible-deployment.md index ff7dc1b93eae4..751a65a15b7c9 100644 --- a/op-guide/offline-ansible-deployment.md +++ b/op-guide/offline-ansible-deployment.md @@ -1,9 +1,12 @@ --- -title: Offline Deployment Using Ansible +title: Deploy TiDB Offline Using Ansible +summary: Use Ansible to deploy a TiDB cluster offline. category: operations --- -# Offline Deployment Using Ansible +# Deploy TiDB Offline Using Ansible + +This guide describes how to deploy a TiDB cluster offline using Ansible. ## Prepare @@ -16,57 +19,79 @@ Before you start, make sure that you have: 2. Several target machines and one Control Machine - - For system requirements and configuration, see [Prepare the environment](ansible-deployment.md#prepare). + - For system requirements and configuration, see [Prepare the environment](../op-guide/ansible-deployment.md#prerequisites). - It is acceptable without access to the Internet. -## Install Ansible and dependencies in the Control Machine +## Step 1: Install system dependencies on the Control Machine + +Take the following steps to install system dependencies on the Control Machine installed with the CentOS 7 system. + +1. Download the [`pip`](https://download.pingcap.org/ansible-system-rpms.el7.tar.gz) offline installation package to the Control Machine. + + ``` + # tar -xzvf ansible-system-rpms.el7.tar.gz + # cd ansible-system-rpms.el7 + # chmod u+x install_ansible_system_rpms.sh + # ./install_ansible_system_rpms.sh + ``` -1. Install Ansible offline on the CentOS 7 system: + > **Note:** This offline installation package includes `pip` and `sshpass`, and only supports the CentOS 7 system. - > Download the [Ansible 2.4.2](https://download.pingcap.org/ansible-2.4.2-rpms.el7.tar.gz) offline installation package to the Control Machine. +2. After the installation is finished, you can use `pip -V` to check whether it is successfully installed. ```bash - # tar -xzvf ansible-2.4.2-rpms.el7.tar.gz + # pip -V + pip 8.1.2 from /usr/lib/python2.7/site-packages (python 2.7) + ``` - # cd ansible-2.4-rpms.el7 + > **Note:** If `pip` is already installed to your system, make sure that the version is 8.1.2 or later. Otherwise, compatibility error occurs when you install Ansible and its dependencies offline. - # chmod u+x install_ansible.sh +## Step 2: Create the `tidb` user on the Control Machine and generate the SSH key + +See [Create the `tidb` user on the Control Machine and generate the SSH key](../op-guide/ansible-deployment.md#step-2-create-the-tidb-user-on-the-control-machine-and-generate-the-ssh-key). + +## Step 3: Install Ansible and its dependencies offline on the Control Machine +Currently, the TiDB 2.0 GA version and the master version are compatible with Ansible 2.5. Ansible and the related dependencies are in the `tidb-ansible/requirements.txt` file. + +1. Download [Ansible 2.5 offline installation package](https://download.pingcap.org/ansible-2.5.0-pip.tar.gz). + +2. Install Ansible and its dependencies offline. + + ``` + # tar -xzvf ansible-2.5.0-pip.tar.gz + # cd ansible-2.5.0-pip/ + # chmod u+x install_ansible.sh # ./install_ansible.sh ``` -2. After Ansible is installed, you can view the version using `ansible --version`. - - ```bash +3. View the version of Ansible. + + After Ansible is installed, you can view the version using `ansible --version`. + + ``` # ansible --version - ansible 2.4.2.0 + ansible 2.5.0 ``` -## Download TiDB-Ansible and TiDB packages on the download machine +## Step 4: Download TiDB-Ansible and TiDB packages on the download machine 1. Install Ansible on the download machine. - Use the following method to install Ansible online on the download machine installed with the CentOS 7 system. Installing using the EPEL source automatically installs the related Ansible dependencies (such as `Jinja2==2.7.2 MarkupSafe==0.11`). After Ansible is installed, you can view the version using `ansible --version`. + Use the following method to install Ansible online on the download machine installed with the CentOS 7 system. After Ansible is installed, you can view the version using `ansible --version`. ```bash # yum install epel-release # yum install ansible curl # ansible --version - - ansible 2.4.2.0 + ansible 2.5.0 ``` - > **Note:** Make sure that the version of Ansible is 2.4 or later, otherwise compatibility problem might occur. + > **Note:** Make sure that the version of Ansible is 2.5, otherwise a compatibility issue occurs. 2. Download TiDB-Ansible. Use the following command to download the corresponding version of TiDB-Ansible from the GitHub [TiDB-Ansible project](https://github.com/pingcap/tidb-ansible). The default folder name is `tidb-ansible`. The following are examples of downloading various versions, and you can turn to the official team for advice on which version to choose. - Download the 1.0 GA version: - - ``` - git clone -b release-1.0 https://github.com/pingcap/tidb-ansible.git - ``` - Download the 2.0 version: ``` @@ -90,15 +115,43 @@ Before you start, make sure that you have: 4. After running the above command, copy the `tidb-ansible` folder to the `/home/tidb` directory of the Control Machine. The ownership authority of the file must be the `tidb` user. -## Orchestrate the TiDB cluster +## Step 5: Configure the SSH mutual trust and sudo rules on the Control Machine + +See [Configure the SSH mutual trust and sudo rules on the Control Machine](../op-guide/ansible-deployment.md#configure-the-ssh-mutual-trust-and-sudo-rules-on-the-control-machine). + +## Step 6: Install the NTP service on the target machines + +See [Install the NTP service on the target machines](../op-guide/ansible-deployment.md#install-the-ntp-service-on-the-target-machines). + +> **Note:** If the time and time zone of all your target machines are same, the NTP service is on and is normally synchronizing time, you can ignore this step. See [How to check whether the NTP service is normal](#how-to-check-whether-the-ntp-service-is-normal). + +## Step 7: Configure the CPUfreq governor mode on the target machine -See [Orchestrate the TiDB cluster](ansible-deployment.md#orchestrate-the-tidb-cluster). +See [Configure the CPUfreq governor mode on the target machine](../op-guide/ansible-deployment.md#configure-the-cpufreq-governor-mode-on-the-target-machine). -## Deploy the TiDB cluster +## Step 8: Mount the data disk ext4 filesystem with options on the target machines + +See [Mount the data disk ext4 filesystem with options on the target machines](../op-guide/ansible-deployment.md#mount-the-data-disk-ext4-filesystem-with-options-on-the-target-machines). + +## Step 9: Edit the `inventory.ini` file to orchestrate the TiDB cluster + +See [Edit the `inventory.ini` file to orchestrate the TiDB cluster](../op-guide/ansible-deployment.md#edit-the-inventory.ini-file-to-orchestrate-the-tidb-cluster). + +## Step 10: Deploy the TiDB cluster + +1. You do not need to run the playbook in `ansible-playbook local_prepare.yml`. + +2. You can use the `Report` button on the Grafana Dashboard to generate the PDF file. This function depends on the `fontconfig` package and English fonts. To use this function, download the offline installation package, upload it to the `grafana_servers` machine, and install it. This package includes `fontconfig` and `open-sans-fonts`, and only supports the CentOS 7 system. + + ``` + $ tar -xzvf grafana-font-rpms.el7.tar.gz + $ cd grafana-font-rpms.el7 + $ chmod u+x install_grafana_font_rpms.sh + $ ./install_grafana_font_rpms.sh + ``` -1. See [Deploy the TiDB cluster](ansible-deployment.md#deploy-the-tidb-cluster). -2. You do not need to run the `ansible-playbook local_prepare.yml` playbook again. +3. See [Deploy the TiDB cluster](../op-guide/ansible-deployment.md#step-10-deploy-the-tidb-cluster). -## Test the cluster +## Test the TiDB cluster -See [Test the cluster](ansible-deployment.md#test-the-cluster). +See [Test the TiDB cluster](../op-guide/ansible-deployment.md#test-the-tidb-cluster). \ No newline at end of file diff --git a/op-guide/recommendation.md b/op-guide/recommendation.md index 6a4a0bba9d44a..eaa84ee3e4a00 100644 --- a/op-guide/recommendation.md +++ b/op-guide/recommendation.md @@ -1,5 +1,6 @@ --- title: Software and Hardware Requirements +summary: Learn the software and hardware requirements for deploying and running TiDB. category: operations --- @@ -21,6 +22,7 @@ As an open source distributed NewSQL database with high performance, TiDB can be > **Note:** > > - For Oracle Enterprise Linux, TiDB supports the Red Hat Compatible Kernel (RHCK) and does not support the Unbreakable Enterprise Kernel provided by Oracle Enterprise Linux. +> - A large number of TiDB tests have been run on the CentOS 7.3 system, and in our community there are a lot of best practices in which TiDB is deployed on the Linux operating system. Therefore, it is recommended to deploy TiDB on CentOS 7.3 or later. > - The support for the Linux operating systems above includes the deployment and operation in physical servers as well as in major virtualized environments like VMware, KVM and XEM. ## Server requirements @@ -32,7 +34,7 @@ You can deploy and run TiDB on the 64-bit generic hardware server platform in th | Component | CPU | Memory | Local Storage | Network | Instance Number (Minimum Requirement) | | :------: | :-----: | :-----: | :----------: | :------: | :----------------: | | TiDB | 8 core+ | 16 GB+ | SAS, 200 GB+ | Gigabit network card | 1 (can be deployed on the same machine with PD) | -| PD | 8 core+ | 16 GB+ | SAS, 200 GB+ | Gigabit network card | 1 (can be deployed on the same machine with TiDB) | +| PD | 4 core+ | 8 GB+ | SAS, 200 GB+ | Gigabit network card | 1 (can be deployed on the same machine with TiDB) | | TiKV | 8 core+ | 32 GB+ | SAS, 200 GB+ | Gigabit network card | 3 | | | | | | Total Server Number | 4 | @@ -45,9 +47,9 @@ You can deploy and run TiDB on the 64-bit generic hardware server platform in th | Component | CPU | Memory | Hard Disk Type | Network | Instance Number (Minimum Requirement) | | :-----: | :------: | :------: | :------: | :------: | :-----: | -| TiDB | 16 core+ | 48 GB+ | SAS | 10 Gigabit network card (2 preferred) | 2 | -| PD | 8 core+ | 16 GB+ | SSD | 10 Gigabit network card (2 preferred) | 3 | -| TiKV | 16 core+ | 48 GB+ | SSD | 10 Gigabit network card (2 preferred) | 3 | +| TiDB | 16 core+ | 32 GB+ | SAS | 10 Gigabit network card (2 preferred) | 2 | +| PD | 4 core+ | 8 GB+ | SSD | 10 Gigabit network card (2 preferred) | 3 | +| TiKV | 16 core+ | 32 GB+ | SSD | 10 Gigabit network card (2 preferred) | 3 | | Monitor | 8 core+ | 16 GB+ | SAS | Gigabit network card | 1 | | | | | | Total Server Number | 9 | @@ -55,23 +57,28 @@ You can deploy and run TiDB on the 64-bit generic hardware server platform in th > > - In the production environment, you can deploy and run TiDB and PD on the same server. If you have a higher requirement for performance and reliability, try to deploy them separately. > - It is strongly recommended to use higher configuration in the production environment. -> - It is recommended to keep the size of TiKV hard disk within 800G in case it takes too long to restore data when the hard disk is damaged. +> - It is recommended to keep the size of TiKV hard disk within 2 TB if you are using PCI-E SSD disks or within 1.5 TB if you are using regular SSD disks. ## Network requirements -As an open source distributed NewSQL database, TiDB requires the following network port configuration to run. Based on the TiDB deployment in actual environments, the administrator can enable relevant ports in the network side and host side. +As an open source distributed NewSQL database, TiDB requires the following network port configuration to run. Based on the TiDB deployment in actual environments, the administrator can open relevant ports in the network side and host side. | Component | Default Port | Description | | :--:| :--: | :-- | -| TiDB | 4000 | the communication port for the application and DBA tools| -| TiDB | 10080 | the communication port to report TiDB status| -| TiKV | 20160 | the TiKV communication port | +| TiDB | 4000 | the communication port for the application and DBA tools | +| TiDB | 10080 | the communication port to report TiDB status | +| TiKV | 20160 | the TiKV communication port | | PD | 2379 | the communication port between TiDB and PD | | PD | 2380 | the inter-node communication port within the PD cluster | -| Prometheus | 9090| the communication port for the Prometheus service| -| Pushgateway | 9091| the aggregation and report port for TiDB, TiKV, and PD monitor | -| Node_exporter | 9100| the communication port to report the system information of every TiDB cluster node | +| Pump | 8250 | the Pump communication port | +| Drainer | 8249 | the Drainer communication port | +| Prometheus | 9090 | the communication port for the Prometheus service| +| Pushgateway | 9091 | the aggregation and report port for TiDB, TiKV, and PD monitor | +| Node_exporter | 9100 | the communication port to report the system information of every TiDB cluster node | +| Blackbox_exporter | 9115 | the Blackbox_exporter communication port, used to monitor the ports in the TiDB cluster | | Grafana | 3000 | the port for the external Web monitoring service and client (Browser) access| +| Grafana | 8686 | the grafana_collector communication port, used to export the Dashboard as the PDF format | +| Kafka_exporter | 9308 | the Kafka_exporter communication port, used to monitor the binlog Kafka cluster | ## Web browser requirements diff --git a/op-guide/security.md b/op-guide/security.md index 6650e86e1bf73..da20cb01aec7d 100644 --- a/op-guide/security.md +++ b/op-guide/security.md @@ -1,5 +1,6 @@ --- title: Enable TLS Authentication +summary: Learn how to enable TLS authentication in a TiDB cluster. category: deployment --- @@ -22,7 +23,7 @@ It is recommended to prepare a separate server certificate for TiDB, TiKV and PD You can use multiple tools to generate self-signed certificates, such as `openssl`, `easy-rsa ` and `cfssl`. -See an example of [generating self-signed certificates](generate-self-signed-certificates.md) using `cfssl`. +See an example of [generating self-signed certificates](../op-guide/generate-self-signed-certificates.md) using `cfssl`. ### Configure certificates diff --git a/op-guide/tidb-config-file.md b/op-guide/tidb-config-file.md index 6c6811fde39e4..5757251748574 100644 --- a/op-guide/tidb-config-file.md +++ b/op-guide/tidb-config-file.md @@ -1,5 +1,6 @@ --- title: TiDB Configuration File Description +summary: Learn the TiDB configuration file options that are not involved in command line options. category: deployment --- @@ -7,7 +8,7 @@ category: deployment The TiDB configuration file supports more options than command line options. You can find the default configuration file in [config/config.toml.example](https://github.com/pingcap/tidb/blob/master/config/config.toml.example) and rename it to `config.toml`. -This document describes the options that are not involved in command line options. For command line options, see [here](configuration.md). +This document describes the options that are not involved in command line options. For command line options, see [here](../op-guide/configuration.md). ### `split-table` @@ -31,6 +32,7 @@ This document describes the options that are not involved in command line option - To configure the value of the `lower_case_table_names` system variable - Default: 2 - For details, you can see the [MySQL description](https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_lower_case_table_names) of this variable +- Currently, TiDB only supports setting the value of this option to 2. This means it is case-sensitive when you save a table name, but case-insensitive when you compare table names. The comparison is based on the lower case. ## Log @@ -48,6 +50,12 @@ Configuration about log. - Default: false - If you set the value to true, the log does not output timestamp +### `slow-query-file` + +- The file name of the slow query log +- Default: "" +- After you set it, the slow query log is output to this file separately + ### `slow-threshold` - To output the threshold value of consumed time in the slow log @@ -174,27 +182,6 @@ Configuration about performance. - Default: 0 - TiDB collects the feedback of each query at the probability of `feedback-probability`, to update statistics -## Plan Cache - -Configuration about Plan Cache. - -### `enabled` - -- To enable Plan Cache -- Default: false -- Enabling Plan Cache saves the query optimization overhead of the same SQL statement - -### `capacity` - -- The number of cached statements -- Default: 2560 - -### `shards` - -- The number of plan-cache buckets -- Default: 256 -- A larger number indicates a smaller particle size of the lock - ## prepared-plan-cache The Plan Cache configuration of the `prepare` statement. @@ -221,3 +208,17 @@ The Plan Cache configuration of the `prepare` statement. - The maximum timeout time when executing a transaction commit - Default: 41s - It is required to set this value larger than twice of the Raft election timeout time + +### txn-local-latches + +Configuration about the transaction latch. It is recommended to enable it when many local transaction conflicts occur. + +### `enable` + +- To enable +- Default: false + +### `capacity` + +- The number of slots corresponding to Hash, which automatically adjusts upward to an exponential multiple of 2. Each slot occupies 32 Bytes of memory. If set too small, it might result in slower running speed and poor performance in the scenario where data writing covers a relatively large range (such as importing data). +- Default: 1024000 diff --git a/op-guide/tidb-v2-upgrade-guide.md b/op-guide/tidb-v2-upgrade-guide.md new file mode 100644 index 0000000000000..7e73465eb5738 --- /dev/null +++ b/op-guide/tidb-v2-upgrade-guide.md @@ -0,0 +1,136 @@ +--- +title: TiDB 2.0 Upgrade Guide +summary: Learn how to upgrade from TiDB 1.0/TiDB 2.0 RC version to TiDB 2.0 GA version. +category: deployment +--- + +# TiDB 2.0 Upgrade Guide + +This document describes how to upgrade from TiDB 1.0 or TiDB 2.0 RC version to TiDB 2.0 GA version. + +## Install Ansible and dependencies in the Control Machine + +TiDB-Ansible release-2.0 depends on Ansible 2.4.2 or later, and is compatible with the latest Ansible 2.5. In addition, TiDB-Ansible release-2.0 depends on the Python module: `jinja2>=2.9.6` and `jmespath>=0.9.0`. + +To make it easy to manage dependencies, use `pip` to install Ansible and its dependencies. For details, see [Install Ansible and its dependencies on the Control Machine](../op-guide/ansible-deployment.md#step-4-install-ansible-and-its-dependencies-on-the-control-machine). For offline environment, see [Install Ansible and its dependencies offline on the Control Machine](../op-guide/offline-ansible-deployment.md#step-3-install-ansible-and-its-dependencies-offline-on-the-control-machine). + +After the installation is finished, you can view the version information using the following command: + +```bash +$ ansible --version +ansible 2.5.2 +$ pip show jinja2 +Name: Jinja2 +Version: 2.9.6 +$ pip show jmespath +Name: jmespath +Version: 0.9.0 +``` + +> **Note:** +> +> - You must install Ansible and its dependencies following the above procedures. +> - Make sure that the Jinja2 version is correct, otherwise an error occurs when you start Grafana. +> - Make sure that the jmespath is correct, otherwise an error occurs when you perform a rolling update for TiKV. + +## Download TiDB-Ansible to the Control Machine + +1. Login to the Control Machine using the `tidb` user account and enter the `/home/tidb` directory. + +2. Back up the `tidb-ansible` folders of TiDB 1.0 OR TiDB 2.0 RC versions using the following command: + + ``` + $ mv tidb-ansible tidb-ansible-bak + ``` + +3. Download the latest tidb-ansible `release-2.0` branch using the following command. The default folder name is `tidb-ansible`. + + ``` + $ git clone -b release-2.0 https://github.com/pingcap/tidb-ansible.git + ``` + +## Edit the `inventory.ini` file and the configuration file + +Login to the Control Machine using the `tidb` user account and enter the `/home/tidb/tidb-ansible` directory. + +### Edit the `inventory.ini` file + +Edit the `inventory.ini` file. For IP information, see the `/home/tidb/tidb-ansible-bak/inventory.ini` backup file. + +Pay special attention to the following variables configuration. For variable meaning, see [Description of other variables](../op-guide/ansible-deployment.md#edit-other-variables-optional). + +1. Make sure that `ansible_user` is the normal user. For unified privilege management, remote installation using the root user is no longer supported. The default configuration uses the `tidb` user as the SSH remote user and the program running user. + + ``` + ## Connection + # ssh via normal user + ansible_user = tidb + ``` + + You can refer to [How to configure SSH mutual trust and sudo rules on the Control Machine](../op-guide/ansible-deployment.md#step-5-configure-the-ssh-mutual-trust-and-sudo-rules-on-the-control-machine) to automatically configure the mutual trust among hosts. + +2. Keep the `process_supervision` variable consistent with that in the previous version. It is recommended to use `systemd` by default. + + ``` + # process supervision, [systemd, supervise] + process_supervision = systemd + ``` + + If you need to modify this variable, see [How to modify the supervision method of a process from `supervise` to `systemd`](../op-guide/ansible-deployment.md#how-to-modify-the-supervision-method-of-a-process-from-supervise-to-systemd). Before you upgrade, first use the `/home/tidb/tidb-ansible-bak/` backup branch to modify the supervision method of a process. + +### Edit the configuration file of TiDB cluster components + +If you have previously customized the configuration file of TiDB cluster components, refer to the backup file to modify the corresponding configuration file in `/home/tidb/tidb-ansible/conf`. + +In TiKV configuration, `end-point-concurrency` is changed to three parameters: `high-concurrency`, `normal-concurrency` and `low-concurrency`. + +``` +readpool: + coprocessor: + # Notice: if CPU_NUM > 8, default thread pool size for coprocessors + # will be set to CPU_NUM * 0.8. + # high-concurrency: 8 + # normal-concurrency: 8 + # low-concurrency: 8 +``` + +For the cluster topology of multiple TiKV instances on a single machine, you need to modify the three parameters above. Recommended configuration: `number of instances * parameter value = number of CPU cores * 0.8`. + +## Download TiDB 2.0 binary to the Control Machine + +Make sure that `tidb_version = v2.0.4` in the `tidb-ansible/inventory.ini` file, and then run the following command to download TiDB 2.0 binary to the Control Machine: + +``` +$ ansible-playbook local_prepare.yml +``` + +## Perform a rolling update to TiDB cluster components + +``` +$ ansible-playbook rolling_update.yml +``` + +## Perform a rolling update to TiDB monitoring component + +To meet the users' demand on mixed deployment, the systemd service of the monitoring component is distinguished by port. + +1. Check the `process_supervision` variable in the `inventory.ini` file. + + ``` + # process supervision, [systemd, supervise] + process_supervision = systemd + ``` + + - If `process_supervision = systemd`, to make it compatible with versions earlier than `v2.0.0-rc.6`, you need to run `migrate_monitor.yml` Playbook. + + ``` + $ ansible-playbook migrate_monitor.yml + ``` + + - If `process_supervision = supervise`, you do not need to run the above command. + +2. Perform a rolling update to the TiDB monitoring component using the following command: + + ``` + $ ansible-playbook rolling_update_monitor.yml + ``` \ No newline at end of file diff --git a/op-guide/tune-tikv.md b/op-guide/tune-tikv.md index 6e3e21a05b87e..ae211f82970c8 100644 --- a/op-guide/tune-tikv.md +++ b/op-guide/tune-tikv.md @@ -1,5 +1,6 @@ --- title: Tune TiKV Performance +summary: Learn how to tune the TiKV parameters for optimal performance. category: tuning --- @@ -70,14 +71,14 @@ log-level = "info" # endpoints = ["127.0.0.1:2379","127.0.0.2:2379","127.0.0.3:2379"] [metric] -# The interval of pushing metrics to Prometheus pushgateway +# The interval of pushing metrics to Prometheus Pushgateway interval = "15s" -# Prometheus pushgateway adress +# Prometheus Pushgateway address address = "" job = "tikv" [raftstore] -# The default value is true,which means writing the data on the disk compulsorily. If it is not in a business scenario +# The default value is true, which means writing the data on the disk compulsorily. If it is not in a business scenario # of the financial security level, it is recommended to set the value to false to achieve better performance. sync-log = true @@ -88,15 +89,15 @@ sync-log = true region-max-size = "384MB" # The threshold value of Region split region-split-size = "256MB" -# When the data size in a Region is larger than the threshold value, TiKV checks whether this Region needs split. -# To reduce the costs of scanning data in the checking process,set the value to 32MB during checking and set it to +# When the data size change in a Region is larger than the threshold value, TiKV checks whether this Region needs split. +# To reduce the costs of scanning data in the checking process, set the value to 32MB during checking and set it to # the default value in normal operation. region-split-check-diff = "32MB" [rocksdb] # The maximum number of threads of RocksDB background tasks. The background tasks include compaction and flush. # For detailed information why RocksDB needs to implement compaction, see RocksDB-related materials. When write -# traffic (like the importing data size) is big,it is recommended to enable more threads. But set the number of the enabled +# traffic (like the importing data size) is big, it is recommended to enable more threads. But set the number of the enabled # threads smaller than that of CPU cores. For example, when importing data, for a machine with a 32-core CPU, # set the value to 28. # max-background-jobs = 8 @@ -235,12 +236,11 @@ min-write-buffer-number-to-merge = 1 max-bytes-for-level-base = "512MB" target-file-size-base = "32MB" -# Generally,you can set it from 256MB to 2GB. In most cases, you can use the default value. But if the system +# Generally, you can set it from 256MB to 2GB. In most cases, you can use the default value. But if the system # resources are adequate, you can set it higher. block-cache-size = "256MB" ``` - ## TiKV memory usage Besides `block cache` and `write buffer` which occupy the system memory, the system memory is occupied in the @@ -256,4 +256,4 @@ following scenarios: + If you demand a high write throughput, it is recommended to use a disk with good throughput capacity. -+ If you demand a very low read-write latency, it is recommended to use SSD with high IOPS. \ No newline at end of file ++ If you demand a very low read-write latency, it is recommended to use SSD with high IOPS. diff --git a/overview.md b/overview.md index 337301f8bae7b..c7f838262dc73 100644 --- a/overview.md +++ b/overview.md @@ -1,13 +1,12 @@ --- -title: About TiDB +title: TiDB Introduction +summary: An introduction to the TiDB database platform category: introduction --- -# About TiDB +# TiDB Introduction -## TiDB introduction - -TiDB (The pronunciation is: /'taɪdiːbi:/ tai-D-B, etymology: titanium) is an open source distributed scalable Hybrid Transactional and Analytical Processing (HTAP) database built by PingCAP. Inspired by the design of Google F1 and Google Spanner, TiDB features infinite horizontal scalability, strong consistency, and high availability. The goal of TiDB is to serve as a one-stop solution for both OLTP (Online Transactional Processing) and OLAP (Online Analytical Processing). +TiDB (The pronunciation is: /'taɪdiːbi:/ tai-D-B, etymology: titanium) is an open-source distributed scalable Hybrid Transactional and Analytical Processing (HTAP) database. It features infinite horizontal scalability, strong consistency, and high availability. TiDB is MySQL compatible and serves as a one-stop data warehouse for both OLTP (Online Transactional Processing) and OLAP (Online Analytical Processing) workloads. - __Horizontal scalability__ @@ -15,7 +14,7 @@ TiDB (The pronunciation is: /'taɪdiːbi:/ tai-D-B, etymology: titanium) is an o - __MySQL compatibility__ - Easily replace MySQL with TiDB to power your applications without changing a single line of code in most cases and still benefit from the MySQL ecosystem. + Easily replace MySQL with TiDB to power your applications without changing a single line of code [in most cases](https://www.pingcap.com/docs/sql/mysql-compatibility/) and still benefit from the MySQL ecosystem. - __Distributed transaction__ @@ -25,7 +24,7 @@ TiDB (The pronunciation is: /'taɪdiːbi:/ tai-D-B, etymology: titanium) is an o TiDB is designed to work in the cloud -- public, private, or hybrid -- making deployment, provisioning, and maintenance drop-dead simple. -- __No more ETL__ +- __Minimize ETL__ ETL (Extract, Transform and Load) is no longer necessary with TiDB's hybrid OLTP/OLAP architecture, enabling you to create new values for your users, easier and faster. @@ -40,76 +39,3 @@ Read the following three articles to understand TiDB techniques: - [Data Storage](https://pingcap.github.io/blog/2017/07/11/tidbinternal1/) - [Computing](https://pingcap.github.io/blog/2017/07/11/tidbinternal2/) - [Scheduling](https://pingcap.github.io/blog/2017/07/20/tidbinternal3/) - -## Roadmap - -Read the [Roadmap](https://github.com/pingcap/docs/blob/master/ROADMAP.md). - -## Connect with us - -- **Twitter**: [@PingCAP](https://twitter.com/PingCAP) -- **Reddit**: https://www.reddit.com/r/TiDB/ -- **Stack Overflow**: https://stackoverflow.com/questions/tagged/tidb -- **Mailing list**: [Google Group](https://groups.google.com/forum/#!forum/tidb-user) - -## TiDB architecture - -To better understand TiDB’s features, you need to understand the TiDB architecture. - -![image alt text](media/tidb-architecture.png) - -The TiDB cluster has three components: the TiDB server, the PD server, and the TiKV server. - -### TiDB server - -The TiDB server is in charge of the following operations: - -1. Receiving the SQL requests - -2. Processing the SQL related logics - -3. Locating the TiKV address for storing and computing data through Placement Driver (PD) - -4. Exchanging data with TiKV - -5. Returning the result - -The TiDB server is stateless. It does not store data and it is for computing only. TiDB is horizontally scalable and provides the unified interface to the outside through the load balancing components such as Linux Virtual Server (LVS), HAProxy, or F5. - -### Placement Driver server - -The Placement Driver (PD) server is the managing component of the entire cluster and is in charge of the following three operations: - -1. Storing the metadata of the cluster such as the region location of a specific key. - -2. Scheduling and load balancing regions in the TiKV cluster, including but not limited to data migration and Raft group leader transfer. - -3. Allocating the transaction ID that is globally unique and monotonic increasing. - -As a cluster, PD needs to be deployed to an odd number of nodes. Usually it is recommended to deploy to 3 online nodes at least. - -### TiKV server - -The TiKV server is responsible for storing data. From an external view, TiKV is a distributed transactional Key-Value storage engine. Region is the basic unit to store data. Each Region stores the data for a particular Key Range which is a left-closed and right-open interval from StartKey to EndKey. There are multiple Regions in each TiKV node. TiKV uses the Raft protocol for replication to ensure the data consistency and disaster recovery. The replicas of the same Region on different nodes compose a Raft Group. The load balancing of the data among different TiKV nodes are scheduled by PD. Region is also the basic unit for scheduling the load balance. - -## Features - -### Horizontal scalability - -Horizontal scalability is the most important feature of TiDB. The scalability includes two aspects: the computing capability and the storage capacity. The TiDB server processes the SQL requests. As the business grows, the overall processing capability and higher throughput can be achieved by simply adding more TiDB server nodes. Data is stored in TiKV. As the size of the data grows, the scalability of data can be resolved by adding more TiKV server nodes. PD schedules data in Regions among the TiKV nodes and migrates part of the data to the newly added node. So in the early stage, you can deploy only a few service instances. For example, it is recommended to deploy at least 3 TiKV nodes, 3 PD nodes and 2 TiDB nodes. As business grows, more TiDB and TiKV instances can be added on-demand. - -### High availability - -High availability is another important feature of TiDB. All of the three components, TiDB, TiKV and PD, can tolerate the failure of some instances without impacting the availability of the entire cluster. For each component, See the following for more details about the availability, the consequence of a single instance failure and how to recover. - -#### TiDB - -TiDB is stateless and it is recommended to deploy at least two instances. The front-end provides services to the outside through the load balancing components. If one of the instances is down, the Session on the instance will be impacted. From the application’s point of view, it is a single request failure but the service can be regained by reconnecting to the TiDB server. If a single instance is down, the service can be recovered by restarting the instance or by deploying a new one. - -#### PD - -PD is a cluster and the data consistency is ensured using the Raft protocol. If an instance is down but the instance is not a Raft Leader, there is no impact on the service at all. If the instance is a Raft Leader, a new Leader will be elected to recover the service. During the election which is approximately 3 seconds, PD cannot provide service. It is recommended to deploy three instances. If one of the instances is down, the service can be recovered by restarting the instance or by deploying a new one. - -#### TiKV - -TiKV is a cluster and the data consistency is ensured using the Raft protocol. The number of the replicas can be configurable and the default is 3 replicas. The load of TiKV servers are balanced through PD. If one of the node is down, all the Regions in the node will be impacted. If the failed node is the Leader of the Region, the service will be interrupted and a new election will be initiated. If the failed node is a Follower of the Region, the service will not be impacted. If a TiKV node is down for a period of time (default 30 minutes), PD will move the data to another TiKV node. diff --git a/releases/11alpha.md b/releases/11alpha.md index 6ccfece50c194..dd629ac6fd680 100644 --- a/releases/11alpha.md +++ b/releases/11alpha.md @@ -15,11 +15,11 @@ On January 19, 2018, TiDB 1.1 Alpha is released. This release has great improvem - Use more compact structure to reduce statistics info memory usage - Speed up loading statistics info when starting tidb-server - Provide more accurate query cost evaluation - - Use Count-Min Sketch to evaluate the cost of queries using unique index more accurately + - Use `Count-Min Sketch` to estimate the cost of queries using unique index more accurately - Support more complex conditions to make full use of index - SQL executor - Refactor all executor operators using Chunk architecture, improve the execution performance of analytical statements and reduce memory usage - - Optimize performance of the `INSERT INGORE` statement + - Optimize performance of the `INSERT IGNORE` statement - Push down more types and functions to TiKV - Support more `SQL_MODE` - Optimize the `Load Data` performance to increase the speed by 10 times @@ -39,14 +39,14 @@ On January 19, 2018, TiDB 1.1 Alpha is released. This release has great improvem ## TiKV: - Support Raft learner -- Optimize Raft Snapshot and reduce the IO overhead +- Optimize Raft Snapshot and reduce the I/O overhead - Support TLS - Optimize the RocksDB configuration to improve performance - Optimize `count (*)` and query performance of unique index in Coprocessor - Add more failpoints and stability test cases - Solve the reconnection issue between PD and TiKV -- Enhance the features of the data recovery tool TiKV-CTL +- Enhance the features of the data recovery tool `tikv-ctl` - Support splitting according to table in Region - Support the `Delete Range` feature -- Support setting the IO limit caused by snapshot +- Support setting the I/O limit caused by snapshot - Improve the flow control mechanism \ No newline at end of file diff --git a/releases/2.0ga.md b/releases/2.0ga.md new file mode 100644 index 0000000000000..417e5da5d66ff --- /dev/null +++ b/releases/2.0ga.md @@ -0,0 +1,155 @@ +--- +title: TiDB 2.0 Release Notes +category: Releases +--- + +# TiDB 2.0 Release Notes + +On April 27, 2018, TiDB 2.0 GA is released! Compared with TiDB 1.0, this release has great improvement in MySQL compatibility, SQL optimizer, executor, and stability. + +## TiDB + +- SQL Optimizer + - Use more compact data structure to reduce the memory usage of statistics information + - Speed up loading statistics information when starting a tidb-server process + - Support updating statistics information dynamically [experimental] + - Optimize the cost model to provide more accurate query cost evaluation + - Use `Count-Min Sketch` to estimate the cost of point queries more accurately + - Support analyzing more complex conditions to make full use of indexes + - Support manually specifying the `Join` order using the `STRAIGHT_JOIN` syntax + - Use the Stream Aggregation operator when the `GROUP BY` clause is empty to improve the performance + - Support using indexes for the `MAX/MIN` function + - Optimize the processing algorithms for correlated subqueries to support decorrelating more types of correlated subqueries and transform them to `Left Outer Join` + - Extend `IndexLookupJoin` to be used in matching the index prefix +- SQL Execution Engine + - Refactor all operators using the Chunk architecture, improve the execution performance of analytical queries, and reduce memory usage. There is a significant improvement in the TPC-H benchmark result. + - Support the Streaming Aggregation operators pushdown + - Optimize the `Insert Into Ignore` statement to improve the performance by over 10 times + - Optimize the `Insert On Duplicate Key Update` statement to improve the performance by over 10 times + - Optimize `Load Data` to improve the performance by over 10 times + - Push down more data types and functions to TiKV + - Support computing the memory usage of physical operators, and specifying the processing behavior in the configuration file and system variables when the memory usage exceeds the threshold + - Support limiting the memory usage by a single SQL statement to reduce the risk of OOM + - Support using implicit RowID in CRUD operations + - Improve the performance of point queries +- Server + - Support the Proxy Protocol + - Add more monitoring metrics and refine the log + - Support validating the configuration files + - Support obtaining the information of TiDB parameters through HTTP API + - Resolve Lock in the Batch mode to speed up garbage collection + - Support multi-threaded garbage collection + - Support TLS +- Compatibility + - Support more MySQL syntaxes + - Support modifying the `lower_case_table_names` system variable in the configuration file to support the OGG data synchronization tool + - Improve compatibility with the Navicat management tool + - Support displaying the table creating time in `Information_Schema` + - Fix the issue that the return types of some functions/expressions differ from MySQL + - Improve compatibility with JDBC + - Support more SQL Modes +- DDL + - Optimize the `Add Index` operation to greatly improve the execution speed in some scenarios + - Attach a lower priority to the `Add Index` operation to reduce the impact on online business + - Output more detailed status information of the DDL jobs in `Admin Show DDL Jobs` + - Support querying the original statements of currently running DDL jobs using `Admin Show DDL Job Queries JobID` + - Support recovering the index data using `Admin Recover Index` for disaster recovery + - Support modifying Table Options using the `Alter` statement + +## PD + +- Support `Region Merge`, to merge empty Regions after deleting data [experimental] +- Support `Raft Learner` [experimental] +- Optimize the scheduler + - Make the scheduler to adapt to different Region sizes + - Improve the priority and speed of restoring data during TiKV outage + - Speed up data transferring when removing a TiKV node + - Optimize the scheduling policies to prevent the disks from becoming full when the space of TiKV nodes is insufficient + - Improve the scheduling efficiency of the balance-leader scheduler + - Reduce the scheduling overhead of the balance-region scheduler + - Optimize the execution efficiency of the the hot-region scheduler +- Operations interface and configuration + - Support TLS + - Support prioritizing the PD leaders + - Support configuring the scheduling policies based on labels + - Support configuring stores with a specific label not to schedule the Raft leader + - Support splitting Region manually to handle the hotspot in a single Region + - Support scattering a specified Region to manually adjust Region distribution in some cases + - Add check rules for configuration parameters and improve validity check of the configuration items +- Debugging interface + - Add the `Drop Region` debugging interface + - Add the interfaces to enumerate the health status of each PD +- Statistics + - Add statistics about abnormal Regions + - Add statistics about Region isolation level + - Add scheduling related metrics +- Performance + - Keep the PD leader and the etcd leader together in the same node to improve write performance + - Optimize the performance of Region heartbeat + +## TiKV + +- Features + - Protect critical configuration from incorrect modification + - Support `Region Merge` [experimental] + - Add the `Raw DeleteRange` API + - Add the `GetMetric` API + - Add `Raw Batch Put`, `Raw Batch Get`, `Raw Batch Delete` and `Raw Batch Scan` + - Add Column Family options for the RawKV API and support executing operation on a specific Column Family + - Support Streaming and Streaming Aggregation in Coprocessor + - Support configuring the request timeout of Coprocessor + - Carry timestamps with Region heartbeats + - Support modifying some RocksDB parameters online, such as `block-cache-size` + - Support configuring the behavior of Coprocessor when it encounters some warnings or errors + - Support starting in the importing data mode to reduce write amplification during the data importing process + - Support manually splitting Region in halves + - Improve the data recovery tool `tikv-ctl` + - Return more statistics in Coprocessor to guide the behavior of TiDB + - Support the `ImportSST` API to import SST files [experimental] + - Add the TiKV Importer binary to integrate with TiDB Lightning to import data quickly [experimental] +- Performance + - Optimize read performance using `ReadPool` and increase the `raw_get/get/batch_get` by 30% + - Improve metrics performance + - Inform PD immediately once the Raft snapshot process is completed to speed up balancing + - Solve performance jitter caused by RocksDB flushing + - Optimize the space reclaiming mechanism after deleting data + - Speed up garbage cleaning while starting the server + - Reduce the I/O overhead during replica migration using `DeleteFilesInRanges` +- Stability + - Fix the issue that gRPC call does not get returned when the PD leader switches + - Fix the issue that it is slow to offline nodes caused by snapshots + - Limit the temporary space usage consumed by migrating replicas + - Report the Regions that cannot elect a leader for a long time + - Update the Region size information in time according to compaction events + - Limit the size of scan lock to avoid request timeout + - Limit the memory usage when receiving snapshots to avoid OOM + - Increase the speed of CI test + - Fix the OOM issue caused by too many snapshots + - Configure `keepalive` of gRPC + - Fix the OOM issue caused by an increase of the Region number + +## TiSpark + +TiSpark uses a separate version number. The current TiSpark version is 1.0 GA. The components of TiSpark 1.0 provide distributed computing of TiDB data using Apache Spark. + +- Provide a gRPC communication framework to read data from TiKV +- Provide encoding and decoding of TiKV component data and communication protocol +- Provide calculation pushdown, which includes: + - Aggregate pushdown + - Predicate pushdown + - TopN pushdown + - Limit pushdown +- Provide index related support + - Transform predicate into Region key range or secondary index + - Optimize `Index Only` queries +    - Adaptively downgrade index scan to table scan per Region +- Provide cost-based optimization + - Support statistics + - Select index + - Estimate broadcast table cost +- Provide support for multiple Spark interfaces + - Support Spark Shell + - Support ThriftServer/JDBC + - Support Spark-SQL interaction + - Support PySpark Shell + - Support SparkR diff --git a/releases/201.md b/releases/201.md new file mode 100644 index 0000000000000..57d1f5c4d89a4 --- /dev/null +++ b/releases/201.md @@ -0,0 +1,49 @@ +--- +title: TiDB 2.0.1 Release Notes +category: Releases +--- + +# TiDB 2.0.1 Release Notes + +On May 16, 2018, TiDB 2.0.1 is released. Compared with TiDB 2.0.0 (GA), this release has great improvement in MySQL compatibility and system stability. + +## TiDB + +- Update the progress of `Add Index` to the DDL job information in real time +- Add the `tidb_auto_analyze_ratio` session variable to control the threshold value of automatic statistics update +- Fix an issue that not all residual states are cleaned up when the transaction commit fails +- Fix a bug about adding indexes in some conditions +- Fix the correctness related issue when DDL modifies surface operations in some concurrent scenarios +- Fix a bug that the result of `LIMIT` is incorrect in some conditions +- Fix a capitalization issue of the `ADMIN CHECK INDEX` statement to make its index name case insensitive +- Fix a compatibility issue of the `UNION` statement +- Fix a compatibility issue when inserting data of `TIME` type +- Fix a goroutine leak issue caused by `copIteratorTaskSender` in some conditions +- Add an option for TiDB to control the behaviour of Binlog failure +- Refactor the `Coprocessor` slow log to distinguish between the scenario of tasks with long processing time and long waiting time +- Log nothing when meeting MySQL protocol handshake error, to avoid too many logs caused by the load balancer Keep Alive mechanism +- Refine the “Out of range value for column” error message +- Fix a bug when there is a subquery in an `Update` statement +- Change the behaviour of handling `SIGTERM`, and do not wait for all queries to terminate anymore + +## PD + +- Add the `Scatter Range` scheduler to balance Regions with the specified key range +- Optimize the scheduling of Merge Region to prevent the newly split Region from being merged +- Add Learner related metrics +- Fix the issue that the scheduler is mistakenly deleted after restart +- Fix the error that occurs when parsing the configuration file +- Fix the issue that the etcd leader and the PD leader are not synchronized +- Fix the issue that Learner still appears after it is closed +- Fix the issue that Regions fail to load because the packet size is too large + +## TiKV + +- Fix the issue that `SELECT FOR UPDATE` prevents others from reading +- Optimize the slow query log +- Reduce the number of `thread_yield` calls +- Fix the bug that raftstore is accidentally blocked when generating the snapshot +- Fix the issue that Learner cannot be successfully elected in special conditions +- Fix the issue that split might cause dirty read in extreme conditions +- Correct the default value of the read thread pool configuration +- Speed up Delete Range diff --git a/releases/202.md b/releases/202.md new file mode 100644 index 0000000000000..c5c13cd12e243 --- /dev/null +++ b/releases/202.md @@ -0,0 +1,30 @@ +--- +title: TiDB 2.0.2 Release Notes +category: Releases +--- + +# TiDB 2.0.2 Release Notes + +On May 21, 2018, TiDB 2.0.2 is released. Compared with TiDB 2.0.1, this release has great improvement in system stability. + +## TiDB + +- Fix the issue of pushing down the Decimal division expression +- Support using the `USE INDEX` syntax in the `Delete` statement +- Forbid using the `shard_row_id_bits` feature in columns with `Auto-Increment` +- Add the timeout mechanism for writing Binlog + +## PD + +- Make the balance leader scheduler filter the disconnected nodes +- Modify the timeout of the transfer leader operator to 10s +- Fix the issue that the label scheduler does not schedule when the cluster Regions are in an unhealthy state +- Fix the improper scheduling issue of `evict leader scheduler` + +## TiKV + +- Fix the issue that the Raft log is not printed +- Support configuring more gRPC related parameters +- Support configuring the timeout range of leader election +- Fix the issue that the obsolete learner is not deleted +- Fix the issue that the snapshot intermediate file is mistakenly deleted \ No newline at end of file diff --git a/releases/203.md b/releases/203.md new file mode 100644 index 0000000000000..5fd47b0825588 --- /dev/null +++ b/releases/203.md @@ -0,0 +1,37 @@ +--- +title: TiDB 2.0.3 Release Notes +category: Releases +--- + +# TiDB 2.0.3 Release Notes + +On June 1, 2018, TiDB 2.0.3 is released. Compared with TiDB 2.0.2, this release has great improvement in system compatibility and stability. + +## TiDB + +- Support modifying the log level online +- Support the `COM_CHANGE_USER` command +- Support using the `TIME` type parameters under the binary protocol +- Optimize the cost estimation of query conditions with the `BETWEEN` expression +- Do not display the `FOREIGN KEY` information in the result of `SHOW CREATE TABLE` +- Optimize the cost estimation for queries with the `LIMIT` clause +- Fix the issue about the `YEAR` type as the unique index +- Fix the issue about `ON DUPLICATE KEY UPDATE` in conditions without the unique index +- Fix the compatibility issue of the `CEIL` function +- Fix the accuracy issue of the `DIV` calculation in the `DECIMAL` type +- Fix the false alarm of `ADMIN CHECK TABLE` +- Fix the panic issue of `MAX`/`MIN` under specific expression parameters +- Fix the issue that the result of `JOIN` is null in special conditions +- Fix the `IN` expression issue when building and querying Range +- Fix a Range calculation issue when using `Prepare` to query and `Plan Cache` is enabled +- Fix the issue that the Schema information is frequently loaded in abnormal conditions + +## PD + +- Fix the panic issue when collecting hot-cache metrics in specific conditions +- Fix the issue about scheduling of the obsolete Regions + +## TiKV + +- Fix the bug that the learner flag mistakenly reports to PD +- Report an error instead of getting a result if `divisor/dividend` is 0 in `do_div_mod` \ No newline at end of file diff --git a/releases/204.md b/releases/204.md new file mode 100644 index 0000000000000..6e6ce10b8d4f7 --- /dev/null +++ b/releases/204.md @@ -0,0 +1,41 @@ +--- +title: TiDB 2.0.4 Release Notes +category: Releases +--- + +# TiDB 2.0.4 Release Notes + +On June 15, 2018, TiDB 2.0.4 is released. Compared with TiDB 2.0.3, this release has great improvement in system compatibility and stability. + +## TiDB + +- Support the `ALTER TABLE t DROP COLUMN a CASCADE` syntax +- Support configuring the value of `tidb_snapshot` to TSO +- Refine the display of statement types in monitoring items +- Optimize the accuracy of query cost estimation +- Configure the `backoff max delay` parameter of gRPC +- Support configuring the memory threshold of a single statement in the configuration file +- Refactor the error of Optimizer +- Fix the side effects of the `Cast Decimal` data +- Fix the wrong result issue of the `Merge Join` operator in specific scenarios +- Fix the issue of converting the Null object to String +- Fix the issue of casting the JSON type of data to the JSON type +- Fix the issue that the result order is not consistent with MySQL in the condition of `Union` + `OrderBy` +- Fix the compliance rules issue when the `Union` statement checks the `Limit/OrderBy` clause +- Fix the compatibility issue of the `Union All` result +- Fix a bug in predicate pushdown +- Fix the compatibility issue of the `Union` statement with the `For Update` clause +- Fix the issue that the `concat_ws` function mistakenly truncates the result + +## PD + +- Improve the behavior of the unset scheduling argument `max-pending-peer-count` by changing it to no limit for the maximum number of `PendingPeer`s + +## TiKV + +- Add the RocksDB `PerfContext` interface for debugging +- Remove the `import-mode` parameter +- Add the `region-properties` command for `tikv-ctl` +- Fix the issue that `reverse-seek` is slow when many RocksDB tombstones exist +- Fix the crash issue caused by `do_sub` +- Make GC record the log when GC encounters many versions of data diff --git a/releases/205.md b/releases/205.md new file mode 100644 index 0000000000000..306b0178eacde --- /dev/null +++ b/releases/205.md @@ -0,0 +1,40 @@ +--- +title: TiDB 2.0.5 Release Notes +category: Releases +--- + +# TiDB 2.0.5 Release Notes + +On July 6, 2018, TiDB 2.0.5 is released. Compared with TiDB 2.0.4, this release has great improvement in system compatibility and stability. + +## TiDB + +- New Features + - Add the `tidb_disable_txn_auto_retry` system variable which is used to disable the automatic retry of transactions [#6877](https://github.com/pingcap/tidb/pull/6877) +- Improvements + - Optimize the cost calculation of `Selection` to make the result more accurate [#6989](https://github.com/pingcap/tidb/pull/6989) + - Select the query condition that completely matches the unique index or the primary key as the query path directly [#6966](https://github.com/pingcap/tidb/pull/6966) + - Execute necessary cleanup when failing to start the service [#6964](https://github.com/pingcap/tidb/pull/6964) + - Handle `\N` as NULL in the `Load Data` statement [#6962](https://github.com/pingcap/tidb/pull/6962) + - Optimize the code structure of CBO [#6953](https://github.com/pingcap/tidb/pull/6953) + - Report the monitoring metrics earlier when starting the service [#6931](https://github.com/pingcap/tidb/pull/6931) + - Optimize the format of slow queries by removing the line breaks in SQL statements and adding user information [#6920](https://github.com/pingcap/tidb/pull/6920) + - Support multiple asterisks in comments [#6858](https://github.com/pingcap/tidb/pull/6858) +- Bug Fixes + - Fix the issue that `KILL QUERY` always requires SUPER privilege [#7003](https://github.com/pingcap/tidb/pull/7003) + - Fix the issue that users might fail to login when the number of users exceeds 1024 [#6986](https://github.com/pingcap/tidb/pull/6986) + - Fix an issue about inserting unsigned `float`/`double` data [#6940](https://github.com/pingcap/tidb/pull/6940) + - Fix the compatibility of the `COM_FIELD_LIST` command to resolve the panic issue in some MariaDB clients [#6929](https://github.com/pingcap/tidb/pull/6929) + - Fix the `CREATE TABLE IF NOT EXISTS LIKE` behavior [#6928](https://github.com/pingcap/tidb/pull/6928) + - Fix an issue in the process of TopN pushdown [#6923](https://github.com/pingcap/tidb/pull/6923) + - Fix the ID record issue of the currently processing row when an error occurs in executing `Add Index` [#6903](https://github.com/pingcap/tidb/pull/6903) + +## PD + +- Fix the issue that replicas migration uses up TiKV disks space in some scenarios +- Fix the crash issue caused by `AdjacentRegionScheduler` + +## TiKV + +- Fix the potential overflow issue in decimal operations +- Fix the dirty read issue that might occur in the process of merging \ No newline at end of file diff --git a/releases/206.md b/releases/206.md new file mode 100644 index 0000000000000..27b9bc9cabb4f --- /dev/null +++ b/releases/206.md @@ -0,0 +1,49 @@ +--- +title: TiDB 2.0.6 Release Notes +category: Releases +--- + +# TiDB 2.0.6 Release Notes + +On August 6, 2018, TiDB 2.0.6 is released. Compared with TiDB 2.0.5, this release has great improvement in system compatibility and stability. + +## TiDB + +- Improvements + - Make "set system variable" log shorter to save disk space [#7031](https://github.com/pingcap/tidb/pull/7031) + - Record slow operations during the execution of `ADD INDEX` in the log, to make troubleshooting easier [#7083](https://github.com/pingcap/tidb/pull/7083) + - Reduce transaction conflicts when updating statistics [#7138](https://github.com/pingcap/tidb/pull/7138) + - Improve the accuracy of row count estimation when the values pending to be estimated exceeds the statistics range [#7185](https://github.com/pingcap/tidb/pull/7185) + - Choose the table with a smaller estimated row count as the outer table for `Index Join` to improve its execution efficiency [#7277](https://github.com/pingcap/tidb/pull/7277) + - Add the recover mechanism for panics occurred during the execution of `ANALYZE TABLE`, to avoid that the tidb-server is unavailable caused by abnormal behavior in the process of collecting statistics [#7228](https://github.com/pingcap/tidb/pull/7228) + - Return `NULL` and the corresponding warning when the results of `RPAD`/`LPAD` exceed the value of the `max_allowed_packet` system variable, compatible with MySQL [#7244](https://github.com/pingcap/tidb/pull/7244) + - Set the upper limit of placeholders count in the `PREPARE` statement to 65535, compatible with MySQL [#7250](https://github.com/pingcap/tidb/pull/7250) +- Bug Fixes + - Fix the issue that the `DROP USER` statement is incompatible with MySQL behavior in some cases [#7014](https://github.com/pingcap/tidb/pull/7014) + - Fix the issue that statements like `INSERT`/`LOAD DATA` meet OOM aftering opening `tidb_batch_insert` [#7092](https://github.com/pingcap/tidb/pull/7092) + - Fix the issue that the statistics fail to automatically update when the data of a table keeps updating [#7093](https://github.com/pingcap/tidb/pull/7093) + - Fix the issue that the firewall breaks inactive gPRC connections [#7099](https://github.com/pingcap/tidb/pull/7099) + - Fix the issue that prefix index returns a wrong result in some scenarios [#7126](https://github.com/pingcap/tidb/pull/7126) + - Fix the panic issue caused by outdated statistics in some scenarios [#7155](https://github.com/pingcap/tidb/pull/7155) + - Fix the issue that one piece of index data is missed after the `ADD INDEX` operation in some scenarios [#7156](https://github.com/pingcap/tidb/pull/7156) + - Fix the wrong result issue when querying `NULL` values using the unique index in some scenarios [#7172](https://github.com/pingcap/tidb/pull/7172) + - Fix the messy code issue of the `DECIMAL` multiplication result in some scenarios [#7212](https://github.com/pingcap/tidb/pull/7212) + - Fix the wrong result issue of `DECIMAL` modulo operation in some scenarios [#7245](https://github.com/pingcap/tidb/pull/7245) + - Fix the issue that the `UPDATE`/`DELETE` statement in a transaction returns a wrong result under some special sequence of statements [#7219](https://github.com/pingcap/tidb/pull/7219) + - Fix the panic issue of the `UNION ALL`/`UPDATE` statement during the process of building the execution plan in some scenarios [#7225](https://github.com/pingcap/tidb/pull/7225) + - Fix the issue that the range of prefix index is calculated incorrectly in some scenarios [#7231](https://github.com/pingcap/tidb/pull/7231) + - Fix the issue that the `LOAD DATA` statement fails to write the binlog in some scenarios [#7242](https://github.com/pingcap/tidb/pull/7242) + - Fix the wrong result issue of `SHOW CREATE TABLE` during the execution process of `ADD INDEX` in some scenarios [#7243](https://github.com/pingcap/tidb/pull/7243) + - Fix the issue that panic occurs when `Index Join` does not initialize timestamps in some scenarios [#7246](https://github.com/pingcap/tidb/pull/7246) + - Fix the false alarm issue when `ADMIN CHECK TABLE` mistakenly uses the timezone in the session [#7258](https://github.com/pingcap/tidb/pull/7258) + - Fix the issue that `ADMIN CLEANUP INDEX` does not clean up the index in some scenarios [#7265](https://github.com/pingcap/tidb/pull/7265) + - Disable the Read Committed isolation level [#7282](https://github.com/pingcap/tidb/pull/7282) + +## TiKV + +- Improvements + - Enlarge scheduler's default slots to reduce false conflicts + - Reduce continuous records of rollback transactions, to improve the Read performance when conflicts are extremely severe + - Limit the size and number of RocksDB log files, to reduce unnecessary disk usage in long-running condition +- Bug Fixes + - Fix the crash issue when converting the data type from string to decimal diff --git a/releases/207.md b/releases/207.md new file mode 100644 index 0000000000000..af6c945a882f1 --- /dev/null +++ b/releases/207.md @@ -0,0 +1,37 @@ +--- +title: TiDB 2.0.7 Release Notes +category: Releases +--- + +# TiDB 2.0.7 Release Notes + +On September 7, 2018, TiDB 2.0.7 is released. Compared with TiDB 2.0.6, this release has great improvement in system compatibility and stability. + +## TiDB + +- New Feature + - Add the `PROCESSLIST` table in `information_schema` [#7286](https://github.com/pingcap/tidb/pull/7286) +- Improvement + - Collect more details about SQL statement execution and output the information in the `SLOW QUERY` log [#7364](https://github.com/pingcap/tidb/pull/7364) + - Drop the partition information in `SHOW CREATE TABLE` [#7388](https://github.com/pingcap/tidb/pull/7388) + - Improve the execution efficiency of the `ANALYZE` statement by setting it to the RC isolation level and low priority [#7500](https://github.com/pingcap/tidb/pull/7500) + - Speed up adding a unique index [#7562](https://github.com/pingcap/tidb/pull/7562) + - Add an option of controlling the DDL concurrency [#7563](https://github.com/pingcap/tidb/pull/7563) +- Bug Fixes + - Fix the issue that `USE INDEX(PRIMARY)` cannot be used in a table whose primary key is an integer [#7298](https://github.com/pingcap/tidb/pull/7298) + - Fix the issue that `Merge Join` and `Index Join` output incorrect results when the inner row is `NULL` [#7301](https://github.com/pingcap/tidb/pull/7301) + - Fix the issue that `Join` outputs an incorrect result when the chunk size is set too small [#7315](https://github.com/pingcap/tidb/pull/7315) + - Fix the panic issue caused by a statement of creating a table involving `range column` [#7379](https://github.com/pingcap/tidb/pull/7379) + - Fix the issue that `admin check table` mistakenly reports an error of a time-type column [#7457](https://github.com/pingcap/tidb/pull/7457) + - Fix the issue that the data with a default value `current_timestamp` cannot be queried using the `=` condition [#7467](https://github.com/pingcap/tidb/pull/7467) + - Fix the issue that the zero-length parameter inserted by using the `ComStmtSendLongData` command is mistakenly parsed to NULL [#7508](https://github.com/pingcap/tidb/pull/7508) + - Fix the issue that `auto analyze` is repeatedly executed in specific scenarios [#7556](https://github.com/pingcap/tidb/pull/7556) + - Fix the issue that the parser cannot parse a single line comment ended with a newline character [#7635](https://github.com/pingcap/tidb/pull/7635) + +## TiKV + +- Improvement + - Open the `dynamic-level-bytes` parameter in an empty cluster by default, to reduce space amplification +- Bug Fix + - Update `approximate size` and `approximate keys count` of a Region after Region merging + diff --git a/releases/208.md b/releases/208.md new file mode 100644 index 0000000000000..3b239460acb87 --- /dev/null +++ b/releases/208.md @@ -0,0 +1,29 @@ +--- +title: TiDB 2.0.8 Release Notes +category: Releases +--- + +# TiDB 2.0.8 Release Notes + +On October 16, 2018, TiDB 2.0.8 is released. Compared with TiDB 2.0.7, this release has great improvement in system compatibility and stability. + +## TiDB + ++ Improvement + - Slow down the AUTO-ID increasing speed when the `Update` statement does not modify the corresponding AUTO-INCREMENT column [#7846](https://github.com/pingcap/tidb/pull/7846) ++ Bug fixes + - Quickly create a new etcd session to recover the service when the PD leader goes down [#7810](https://github.com/pingcap/tidb/pull/7810) + - Fix the issue that the time zone is not considered when the default value of the `DateTime` type is calculated [#7672](https://github.com/pingcap/tidb/pull/7672) + - Fix the issue that `duplicate key update` inserts values incorrectly in some conditions [#7685](https://github.com/pingcap/tidb/pull/7685) + - Fix the issue that the predicate conditions of `UnionScan` are not pushed down [#7726](https://github.com/pingcap/tidb/pull/7726) + - Fix the issue that the time zone is not correctly handled when you add the `TIMESTAMP` index [#7812](https://github.com/pingcap/tidb/pull/7812) + - Fix the memory leak issue caused by the statistics module in some conditions [#7864](https://github.com/pingcap/tidb/pull/7864) + - Fix the issue that the results of `ANALYZE` cannot be obtained in some abnormal conditions [#7871](https://github.com/pingcap/tidb/pull/7871) + - Do not fold the function `SYSDATE`, to ensure the returned results are correct [#7894](https://github.com/pingcap/tidb/pull/7894) + - Fix the `substring_index` panic issue in some conditions [#7896](https://github.com/pingcap/tidb/pull/7896) + - Fix the issue that `OUTER JOIN` is mistakenly converted to `INNER JOIN` in some conditions [#7899](https://github.com/pingcap/tidb/pull/7899) + +## TiKV + ++ Bug fix + - Fix the issue that the memory consumed by Raftstore `EntryCache` keeps increasing when a node goes down [#3529](https://github.com/tikv/tikv/pull/3529) diff --git a/releases/21beta.md b/releases/21beta.md new file mode 100644 index 0000000000000..cd7e8d9080f6b --- /dev/null +++ b/releases/21beta.md @@ -0,0 +1,85 @@ +--- +title: TiDB 2.1 Beta Release Notes +category: Releases +--- + +# TiDB 2.1 Beta Release Notes + +On June 29, 2018, TiDB 2.1 Beta is released! Compared with TiDB 2.0, this release has great improvement in stability, SQL optimizer, statistics information, and execution engine. + +## TiDB + +- SQL Optimizer + - Optimize the selection range of `Index Join` to improve the execution performance + - Optimize correlated subquery, push down `Filter`, and extend the index range, to improve the efficiency of some queries by orders of magnitude + - Support `Index Hint` and `Join Hint` in the `UPDATE` and `DELETE` statements + - Validate Hint `TIDM_SMJ` when no available index exists + - Support pushdown of the `ABS`, `CEIL`, `FLOOR`, `IS TRUE`, and `IS FALSE` functions + - Handle the `IF` and `IFNULL` functions especially in the constant folding process +- SQL Execution Engine + - Implement parallel `Hash Aggregate` operators and improve the computing performance of `Hash Aggregate` by 350% in some scenarios + - Implement parallel `Project` operators and improve the performance by 74% in some scenarios + - Read the data of the `Inner` table and `Outer` table of `Hash Join` concurrently to improve the execution performance + - Fix incorrect results of `INSERT … ON DUPLICATE KEY UPDATE …` in some scenarios + - Fix incorrect results of the `CONCAT_WS`, `FLOOR`, `CEIL`, and `DIV` built-in functions +- Server + - Add the HTTP API to scatter the distribution of table Regions in the TiKV cluster + - Add the `auto_analyze_ratio` system variable to control the threshold value of automatic `Analyze` + - Add the HTTP API to control whether to open the general log + - Add the HTTP API to modify the log level online + - Add the user information in the general log and the slow query log + - Support the server side cursor +- Compatibility + - Support more MySQL syntax + - Make the `bit` aggregate function support the `ALL` parameter + - Support the `SHOW PRIVILEGES` statement +- DML + - Decrease the memory usage of the `INSERT INTO SELECT` statement + - Fix the performance issue of `PlanCache` + - Add the `tidb_retry_limit` system variable to control the automatic retry times of transactions + - Add the `tidb_disable_txn_auto_retry` system variable to control whether the transaction tries automatically + - Fix the accuracy issue of the written data of the `time` type + - Support the queue of locally conflicted transactions to optimize the conflicted transaction performance + - Fix `Affected Rows` of the `UPDATE` statement + - Optimize the statement performance of `insert ignore on duplicate key update` +- DDL + - Optimize the execution speed of the `CreateTable` statement + - Optimize the execution speed of `ADD INDEX` and improve it greatly in some scenarios + - Fix the issue that the number of added columns by `Alter table add column` exceeds the limit of the number of table columns + - Fix the issue that DDL job retries lead to an increasing pressure on TiKV in abnormal conditions + - Fix the issue that TiDB continuously reloads the schema information in abnormal conditions + - Do not output the `FOREIGN KEY` related information in the result of `SHOW CREATE TABLE` + - Support the `select tidb_is_ddl_owner()` statement to facilitate judging whether TiDB is `DDL Owner` + - Fix the issue that the index is deleted in the `Year` type in some scenarios + - Fix the renaming table issue in the concurrent execution scenario + - Support the `AlterTableForce` syntax + - Support the `AlterTableRenameIndex` syntax with `FromKey` and `ToKey` + - Add the table name and database name in the output information of `admin show ddl jobs` + +## PD + +- Enable Raft PreVote between PD nodes to avoid leader reelection when network recovers after network isolation +- Optimize the issue that Balance Scheduler schedules small Regions frequently +- Optimize the hotspot scheduler to improve its adaptability in traffic statistics information jitters +- Skip the Regions with a large number of rows when scheduling `region merge` +- Enable `raft learner` by default to lower the risk of unavailable data caused by machine failure during scheduling +- Remove `max-replica` from `pd-recover` +- Add `Filter` metrics +- Fix the issue that Region information is not updated after tikv-ctl unsafe recovery +- Fix the issue that TiKV disk space is used up caused by replica migration in some scenarios +- Compatibility notes + - Do not support rolling back to v2.0.x or earlier due to update of the new version storage engine + - Enable `raft learner` by default in the new version of PD. If the cluster is upgraded from 1.x to 2.1, the machine should be stopped before upgrade or a rolling update should be first applied to TiKV and then PD + + +## TiKV + +- Upgrade Rust to the `nightly-2018-06-14` version +- Enable `Raft PreVote` to avoid leader reelection generated when network recovers after network isolation +- Add a metric to display the number of files and `ingest` related information in each layer of RocksDB +- Print `key` with too many versions when GC works +- Use `static metric` to optimize multi-label metric performance (YCSB `raw get` is improved by 3%) +- Remove `box` in multiple modules and use patterns to improve the operating performance (YCSB `raw get` is improved by 3%) +- Use `asynchronous log` to improve the performance of writing logs +- Add a metric to collect the thread status +- Decease memory copy times by decreasing `box` used in the application to improve the performance diff --git a/releases/21rc1.md b/releases/21rc1.md new file mode 100644 index 0000000000000..e48f20f3729ab --- /dev/null +++ b/releases/21rc1.md @@ -0,0 +1,155 @@ +--- +title: TiDB 2.1 RC1 Release Notes +category: Releases +--- + +# TiDB 2.1 RC1 Release Notes + +On August 24, 2018, TiDB 2.1 RC1 is released! Compared with TiDB 2.1 Beta, this release has great improvement in stability, SQL optimizer, statistics information, and execution engine. + +## TiDB + +- SQL Optimizer + - Fix the issue that a wrong result is returned after the correlated subquery is decorrelated in some cases [#6972](https://github.com/pingcap/tidb/pull/6972) + - Optimize the output result of `Explain` [#7011](https://github.com/pingcap/tidb/pull/7011)[#7041](https://github.com/pingcap/tidb/pull/7041) + - Optimize the choosing strategy of the outer table for `IndexJoin` [#7019](https://github.com/pingcap/tidb/pull/7019) + - Remove the Plan Cache of the non-`PREPARE` statement [#7040](https://github.com/pingcap/tidb/pull/7040) + - Fix the issue that the `INSERT` statement is not parsed and executed correctly in some cases [#7068](https://github.com/pingcap/tidb/pull/7068) + - Fix the issue that the `IndexJoin` result is not correct in some cases [#7150](https://github.com/pingcap/tidb/pull/7150) + - Fix the issue that the `NULL` value cannot be found using the unique index in some cases [#7163](https://github.com/pingcap/tidb/pull/7163) + - Fix the range computing issue of the prefix index in UTF-8 [#7194](https://github.com/pingcap/tidb/pull/7194) + - Fix the issue that result is not correct caused by eliminating the `Project` operator in some cases [#7257](https://github.com/pingcap/tidb/pull/7257) + - Fix the issue that `USE INDEX(PRIMARY)` cannot be used when the primary key is an integer [#7316](https://github.com/pingcap/tidb/pull/7316) + - Fix the issue that the index range cannot be computed using the correlated column in some cases [#7357](https://github.com/pingcap/tidb/pull/7357) +- SQL Execution Engine + - Fix the issue that the daylight saving time is not computed correctly in some cases [#6823](https://github.com/pingcap/tidb/pull/6823) + - Refactor the aggregation function framework to improve the execution efficiency of the `Stream` and `Hash` aggregation operators [#6852](https://github.com/pingcap/tidb/pull/6852) + - Fix the issue that the `Hash` aggregation operator cannot exit normally in some cases [#6982](https://github.com/pingcap/tidb/pull/6982) + - Fix the issue that `BIT_AND`/`BIT_OR`/`BIT_XOR` does not handle the non-integer data correctly [#6994](https://github.com/pingcap/tidb/pull/6994) + - Optimize the execution speed of the `REPLACE INTO` statement and increase the performance nearly 10 times [#7027](https://github.com/pingcap/tidb/pull/7027) + - Optimize the memory usage of time type data and decrease the memory usage of the time type data by fifty percent [#7043](https://github.com/pingcap/tidb/pull/7043) + - Fix the issue that the returned result is mixed with signed and unsigned integers in the `UNION` statement is not compatible with MySQL [#7112](https://github.com/pingcap/tidb/pull/7112) + - Fix the panic issue caused by the too much memory applied by `LPAD`/`RPAD`/`TO_BASE64`/`FROM_BASE64`/`REPEAT` [#7171](https://github.com/pingcap/tidb/pull/7171) [#7266](https://github.com/pingcap/tidb/pull/7266) [#7409](https://github.com/pingcap/tidb/pull/7409) [#7431](https://github.com/pingcap/tidb/pull/7431) + - Fix the incorrect result when `MergeJoin`/`IndexJoin` handles the `NULL` value [#7255](https://github.com/pingcap/tidb/pull/7255) + - Fix the incorrect result of `Outer Join` in some cases [#7288](https://github.com/pingcap/tidb/pull/7288) + - Improve the error message of `Data Truncated` to facilitate locating the wrong data and the corresponding field in the table [#7401](https://github.com/pingcap/tidb/pull/7401) + - Fix the incorrect result for `decimal` in some cases [#7001](https://github.com/pingcap/tidb/pull/7001) [#7113](https://github.com/pingcap/tidb/pull/7113) [#7202](https://github.com/pingcap/tidb/pull/7202) [#7208](https://github.com/pingcap/tidb/pull/7208) + - Optimize the point select performance [#6937](https://github.com/pingcap/tidb/pull/6937) + - Prohibit the isolation level of `Read Commited` to avoid the underlying problem [#7211](https://github.com/pingcap/tidb/pull/7211) + - Fix the incorrect result of `LTRIM`/`RTRIM`/`TRIM` in some cases [#7291](https://github.com/pingcap/tidb/pull/7291) + - Fix the issue that the `MaxOneRow` operator cannot guarantee that the returned result does not exceed one row [#7375](https://github.com/pingcap/tidb/pull/7375) + - Divide the Coprocessor requests with too many ranges [#7454](https://github.com/pingcap/tidb/pull/7454) +- Statistics + - Optimize the mechanism of statistics dynamic collection [#6796](https://github.com/pingcap/tidb/pull/6796) + - Fix the issue that `Auto Analyze` does not work when data is updated frequently [#7022](https://github.com/pingcap/tidb/pull/7022) + - Decrease the Write conflicts during the statistics dynamic update process [#7124](https://github.com/pingcap/tidb/pull/7124) + - Optimize the cost estimation when the statistics is incorrect [#7175](https://github.com/pingcap/tidb/pull/7175) + - Optimize the `AccessPath` cost estimation strategy [#7233](https://github.com/pingcap/tidb/pull/7233) +- Server + - Fix the bug in loading privilege information [#6976](https://github.com/pingcap/tidb/pull/6976) + - Fix the issue that the `Kill` command is too strict with privilege check [#6954](https://github.com/pingcap/tidb/pull/6954) + - Fix the issue of removing some binary numeric types [#6922](https://github.com/pingcap/tidb/pull/6922) + - Shorten the output log [#7029](https://github.com/pingcap/tidb/pull/7029) + - Handle the `mismatchClusterID` issue [#7053](https://github.com/pingcap/tidb/pull/7053) + - Add the `advertise-address` configuration item [#7078](https://github.com/pingcap/tidb/pull/7078) + - Add the `GrpcKeepAlive` option [#7100](https://github.com/pingcap/tidb/pull/7100) + - Add the connection or `Token` time monitor [#7110](https://github.com/pingcap/tidb/pull/7110) + - Optimize the data decoding performance [#7149](https://github.com/pingcap/tidb/pull/7149) + - Add the `PROCESSLIST` table in `INFORMMATION_SCHEMA` [#7236](https://github.com/pingcap/tidb/pull/7236) + - Fix the order issue when multiple rules are hit in verifying the privilege [#7211](https://github.com/pingcap/tidb/pull/7211) + - Change some default values of encoding related system variables to UTF-8 [#7198](https://github.com/pingcap/tidb/pull/7198) + - Make the slow query log show more detailed information [#7302](https://github.com/pingcap/tidb/pull/7302) + - Support registering tidb-server related information in PD and obtaining this information by HTTP API [#7082](https://github.com/pingcap/tidb/pull/7082) +- Compatibility + - Support Session variables `warning_count` and `error_count` [#6945](https://github.com/pingcap/tidb/pull/6945) + - Add `Scope` check when reading the system variables [#6958](https://github.com/pingcap/tidb/pull/6958) + - Support the `MAX_EXECUTION_TIME` syntax [#7012](https://github.com/pingcap/tidb/pull/7012) + - Support more statements of the `SET` syntax [#7020](https://github.com/pingcap/tidb/pull/7020) + - Add validity check when setting system variables [#7117](https://github.com/pingcap/tidb/pull/7117) + - Add the verification of the number of `PlaceHolder`s in the `Prepare` statement [#7162](https://github.com/pingcap/tidb/pull/7162) + - Support `set character_set_results = null` [#7353](https://github.com/pingcap/tidb/pull/7353) + - Support the `flush status` syntax [#7369](https://github.com/pingcap/tidb/pull/7369) + - Fix the column size of `SET` and `ENUM` types in `information_schema` [#7347](https://github.com/pingcap/tidb/pull/7347) + - Support the `NATIONAL CHARACTER` syntax of statements for creating a table [#7378](https://github.com/pingcap/tidb/pull/7378) + - Support the `CHARACTER SET` syntax in the `LOAD DATA` statement [#7391](https://github.com/pingcap/tidb/pull/7391) + - Fix the column information of the `SET` and `ENUM` types [#7417](https://github.com/pingcap/tidb/pull/7417) + - Support the `IDENTIFIED WITH` syntax in the `CREATE USER` statement [#7402](https://github.com/pingcap/tidb/pull/7402) + - Fix the precision losing issue during `TIMESTAMP` computing process [#7418](https://github.com/pingcap/tidb/pull/7418) + - Support the validity verification of more `SYSTEM` variables [#7196](https://github.com/pingcap/tidb/pull/7196) + - Fix the incorrect result when the `CHAR_LENGTH` function computes the binary string [#7410](https://github.com/pingcap/tidb/pull/7410) + - Fix the incorrect `CONCAT` result in a statement involving `GROUP BY` [#7448](https://github.com/pingcap/tidb/pull/7448) + - Fix the imprecise type length issue when casting the `DECIMAL` type to the `STRING` type [#7451](https://github.com/pingcap/tidb/pull/7451) +- DML + - Fix the stability issue of the `Load Data` statement [#6927](https://github.com/pingcap/tidb/pull/6927) + - Fix the memory usage issue when performing some `Batch` operations [#7086](https://github.com/pingcap/tidb/pull/7086) + - Improve the performance of the `Replace Into` statement [#7027](https://github.com/pingcap/tidb/pull/7027) + - Fix the inconsistent precision issue when writing `CURRENT_TIMESTAMP` [#7355](https://github.com/pingcap/tidb/pull/7355) +- DDL + - Improve the method of DDL judging whether `Schema` is synchronized to avoid misjudgement in some cases [#7319](https://github.com/pingcap/tidb/pull/7319) + - Fix the `SHOW CREATE TABLE` result in adding index process [#6993](https://github.com/pingcap/tidb/pull/6993) + - Allow the default value of `text`/`blob`/`json` to be NULL in non-restrict `sql-mode` [#7230](https://github.com/pingcap/tidb/pull/7230) + - Fix the `ADD INDEX` issue in some cases [#7142](https://github.com/pingcap/tidb/pull/7142) + - Increase the speed of adding `UNIQUE-KEY` index operation largely [#7132](https://github.com/pingcap/tidb/pull/7132) + - Fix the truncating issue of the prefix index in UTF-8 character set [#7109](https://github.com/pingcap/tidb/pull/7109) + - Add the environment variable `tidb_ddl_reorg_priority` to control the priority of the `add-index` operation [#7116](https://github.com/pingcap/tidb/pull/7116) + - Fix the display issue of `AUTO-INCREMENT` in `information_schema.tables` [#7037](https://github.com/pingcap/tidb/pull/7037) + - Support the `admin show ddl jobs ` command and support output specified number of DDL jobs [#7028](https://github.com/pingcap/tidb/pull/7028) + - Support parallel DDL job execution [#6955](https://github.com/pingcap/tidb/pull/6955) +- [Table Partition](https://github.com/pingcap/tidb/projects/6) (Experimental) + - Support top level partition + - Support `Range Partition` + +## PD + +- Features + - Introduce the version control mechanism and support rolling update of the cluster with compatibility + - Enable the `region merge` feature + - Support the `GetPrevRegion` interface + - Support splitting Regions in batch + - Support storing the GC safepoint +- Improvements + - Optimize the issue that TSO allocation is affected by the system clock going backwards + - Optimize the performance of handling Region heartbeats + - Optimize the Region tree performance + - Optimize the performance of computing hotspot statistics + - Optimize returning the error code of API interface + - Add options of controlling scheduling strategies + - Prohibit using special characters in `label` + - Improve the scheduling simulator + - Support splitting Regions using statistics in pd-ctl + - Support formatting JSON output by calling `jq` in pd-ctl + - Add metrics about etcd Raft state machine +- Bug fixes + - Fix the issue that the namespace is not reloaded after switching Leader + - Fix the issue that namespace scheduling exceeds the schedule limit + - Fix the issue that hotspot scheduling exceeds the schedule limit + - Fix the issue that wrong logs are output when the PD client closes + - Fix the wrong statistics of Region heartbeat latency + +## TiKV + +- Features + - Support `batch split` to avoid too large Regions caused by the Write operation on hot Regions + - Support splitting Regions based on the number of rows to improve the index scan efficiency +- Performance + - Use `LocalReader` to separate the Read operation from the raftstore thread to lower the Read latency + - Refactor the MVCC framework, optimize the memory usage and improve the scan Read performance + - Support splitting Regions based on statistics estimation to reduce the I/O usage + - Optimize the issue that the Read performance is affected by continuous Write operations on the rollback record + - Reduce the memory usage of pushdown aggregation computing +- Improvements + - Add the pushdown support for a large number of built-in functions and better charset support + - Optimize the GC workflow, improve the GC speed and decrease the impact of GC on the system + - Enable `prevote` to speed up service recovery when the network is abnormal + - Add the related configuration items of RocksDB log files + - Adjust the default configuration of `scheduler_latch` + - Support setting whether to compact the data in the bottom layer of RocksDB when using tikv-ctl to compact data manually + - Add the check for environment variables when starting TiKV + - Support dynamically configuring the `dynamic_level_bytes` parameter based on the existing data + - Support customizing the log format + - Integrate tikv-fail in tikv-ctl + - Add I/O metrics of threads +- Bug fixes + - Fix decimal related issues + - Fix the issue that `gRPC max_send_message_len` is set mistakenly + - Fix the issue caused by misconfiguration of `region_size` diff --git a/releases/21rc2.md b/releases/21rc2.md new file mode 100644 index 0000000000000..909653d7e208e --- /dev/null +++ b/releases/21rc2.md @@ -0,0 +1,119 @@ +--- +title: TiDB 2.1 RC2 Release Notes +category: Releases +--- + +# TiDB 2.1 RC2 Release Notes + +On September 14, 2018, TiDB 2.1 RC2 is released. Compared with TiDB 2.1 RC1, this release has great improvement in stability, SQL optimizer, statistics information, and execution engine. + +## TiDB + +* SQL Optimizer + * Put forward a proposal of the next generation Planner [#7543](https://github.com/pingcap/tidb/pull/7543) + * Improve the optimization rules of constant propagation [#7276](https://github.com/pingcap/tidb/pull/7276) + * Enhance the computing logic of `Range` to enable it to handle multiple `IN` or `EQUAL` conditions simultaneously [#7577](https://github.com/pingcap/tidb/pull/7577) + * Fix the issue that the estimation result of `TableScan` is incorrect when `Range` is empty [#7583](https://github.com/pingcap/tidb/pull/7583) + * Support the `PointGet` operator for the `UPDATE` statement [#7586](https://github.com/pingcap/tidb/pull/7586) + * Fix the panic issue during the process of executing the `FirstRow` aggregate function in some conditions [#7624](https://github.com/pingcap/tidb/pull/7624) +* SQL Execution Engine + * Fix the potential `DataRace` issue when the `HashJoin` operator encounters an error [#7554](https://github.com/pingcap/tidb/pull/7554) + * Make the `HashJoin` operator read the inner table and build the hash table simultaneously [#7544](https://github.com/pingcap/tidb/pull/7544) + * Optimize the performance of Hash aggregate operators [#7541](https://github.com/pingcap/tidb/pull/7541) + * Optimize the performance of Join operators [#7493](https://github.com/pingcap/tidb/pull/7493), [#7433](https://github.com/pingcap/tidb/pull/7433) + * Fix the issue that the result of `UPDATE JOIN` is incorrect when the Join order is changed [#7571](https://github.com/pingcap/tidb/pull/7571) + * Improve the performance of Chunk’s iterator [#7585](https://github.com/pingcap/tidb/pull/7585) +* Statistics + * Fix the issue that the auto Analyze work repeatedly analyzes the statistics [#7550](https://github.com/pingcap/tidb/pull/7550) + * Fix the statistics update error that occurs when there is no statistics change [#7530](https://github.com/pingcap/tidb/pull/7530) + * Use the RC isolation level and low priority when building `Analyze` requests [#7496](https://github.com/pingcap/tidb/pull/7496) + * Support enabling statistics auto-analyze on certain period of a day [#7570](https://github.com/pingcap/tidb/pull/7570) + * Fix the panic issue when logging the statistics information [#7588](https://github.com/pingcap/tidb/pull/7588) + * Support configuring the number of buckets in the histogram using the `ANALYZE TABLE WITH BUCKETS` statement [#7619](https://github.com/pingcap/tidb/pull/7619) + * Fix the panic issue when updating an empty histogram [#7640](https://github.com/pingcap/tidb/pull/7640) + * Update `information_schema.tables.data_length` using the statistics information [#7657](https://github.com/pingcap/tidb/pull/7657) +* Server + * Add Trace related dependencies [#7532](https://github.com/pingcap/tidb/pull/7532) + * Enable the `mutex profile` feature of Golang [#7512](https://github.com/pingcap/tidb/pull/7512) + * The `Admin` statement requires the `Super_priv` privilege [#7486](https://github.com/pingcap/tidb/pull/7486) + * Forbid users to `Drop` crucial system tables [#7471](https://github.com/pingcap/tidb/pull/7471) + * Switch from `juju/errors` to `pkg/errors` [#7151](https://github.com/pingcap/tidb/pull/7151) + * Complete the functional prototype of SQL Tracing [#7016](https://github.com/pingcap/tidb/pull/7016) + * Remove the goroutine pool [#7564](https://github.com/pingcap/tidb/pull/7564) + * Support viewing the goroutine information using the `USER1` signal [#7587](https://github.com/pingcap/tidb/pull/7587) + * Set the internal SQL to high priority while TiDB is started [#7616](https://github.com/pingcap/tidb/pull/7616) + * Use different labels to filter internal SQL and user SQL in monitoring metrics [#7631](https://github.com/pingcap/tidb/pull/7631) + * Store the top 30 slow queries in the last week to the TiDB server [#7646](https://github.com/pingcap/tidb/pull/7646) + * Put forward a proposal of setting the global system time zone for the TiDB cluster [#7656](https://github.com/pingcap/tidb/pull/7656) + * Enrich the error message of “GC life time is shorter than transaction duration” [#7658](https://github.com/pingcap/tidb/pull/7658) + * Set the global system time zone when starting the TiDB cluster [#7638](https://github.com/pingcap/tidb/pull/7638) +* Compatibility + * Add the unsigned flag for the `Year` type [#7542](https://github.com/pingcap/tidb/pull/7542) + * Fix the issue of configuring the result length of the `Year` type in the `Prepare`/`Execute` mode [#7525](https://github.com/pingcap/tidb/pull/7525) + * Fix the issue of inserting zero timestamp in the `Prepare`/`Execute` mode [#7506](https://github.com/pingcap/tidb/pull/7506) + * Fix the error handling issue of the integer division [#7492](https://github.com/pingcap/tidb/pull/7492) + * Fix the compatibility issue when processing `ComStmtSendLongData` [#7485](https://github.com/pingcap/tidb/pull/7485) + * Fix the error handling issue during the process of converting string to integer [#7483](https://github.com/pingcap/tidb/pull/7483) + * Optimize the accuracy of values in the `information_schema.columns_in_table` table [#7463](https://github.com/pingcap/tidb/pull/7463) + * Fix the compatibility issue when writing or updating the string type of data using the MariaDB client [#7573](https://github.com/pingcap/tidb/pull/7573) + * Fix the compatibility issue of aliases of the returned value [#7600](https://github.com/pingcap/tidb/pull/7600) + * Fix the issue that the `NUMERIC_SCALE` value of the float type is incorrect in the `information_schema.COLUMNS` table [#7602](https://github.com/pingcap/tidb/pull/7602) + * Fix the issue that Parser reports an error when the single line comment is empty [#7612](https://github.com/pingcap/tidb/pull/7612) +* Expressions + * Check the value of `max_allowed_packet` in the `insert` function [#7528](https://github.com/pingcap/tidb/pull/7528) + * Support the built-in function `json_contains` [#7443](https://github.com/pingcap/tidb/pull/7443) + * Support the built-in function `json_contains_path` [#7596](https://github.com/pingcap/tidb/pull/7596) + * Support the built-in function `encode/decode` [#7622](https://github.com/pingcap/tidb/pull/7622) + * Fix the issue that some time related functions are not compatible with the MySQL behaviors in some cases [#7636](https://github.com/pingcap/tidb/pull/7636) + * Fix the compatibility issue of parsing the time type of data in string [#7654](https://github.com/pingcap/tidb/pull/7654) + * Fix the issue that the time zone is not considered when computing the default value of the `DateTime` data [#7655](https://github.com/pingcap/tidb/pull/7655) +* DML + * Set correct `last_insert_id` in the `InsertOnDuplicateUpdate` statement [#7534](https://github.com/pingcap/tidb/pull/7534) + * Reduce the cases of updating the `auto_increment_id` counter [#7515](https://github.com/pingcap/tidb/pull/7515) + * Optimize the error message of `Duplicate Key` [#7495](https://github.com/pingcap/tidb/pull/7495) + * Fix the `insert...select...on duplicate key update` issue [#7406](https://github.com/pingcap/tidb/pull/7406) + * Support the `LOAD DATA IGNORE LINES` statement [#7576](https://github.com/pingcap/tidb/pull/7576) +* DDL + * Add the DDL job type and the current schema version information in the monitor [#7472](https://github.com/pingcap/tidb/pull/7472) + * Complete the design of the `Admin Restore Table` feature [#7383](https://github.com/pingcap/tidb/pull/7383) + * Fix the issue that the default value of the `Bit` type exceeds 128 [#7249](https://github.com/pingcap/tidb/pull/7249) + * Fix the issue that the default value of the `Bit` type cannot be `NULL` [#7604](https://github.com/pingcap/tidb/pull/7604) + * Reduce the interval of checking `CREATE TABLE/DATABASE` in the DDL queue [#7608](https://github.com/pingcap/tidb/pull/7608) + * Use the `ddl/owner/resign` HTTP interface ro release the DDL owner and start electing a new owner [#7649](https://github.com/pingcap/tidb/pull/7649) +* TiKV Go Client + * Support the issue that the `Seek` operation only obtains `Key` [#7419](https://github.com/pingcap/tidb/pull/7419) +* [Table Partition](https://github.com/pingcap/tidb/projects/6) (Experimental) + * Fix the issue that the `Bigint` type cannot be used as the partition key [#7520](https://github.com/pingcap/tidb/pull/7520) + * Support the rollback operation when an issue occurs during adding an index in the partitioned table [#7437](https://github.com/pingcap/tidb/pull/7437) + +## PD + +* Features + * Support the `GetAllStores` interface [#1228](https://github.com/pingcap/pd/pull/1228) + * Add the statistics of scheduling estimation in Simulator [#1218](https://github.com/pingcap/pd/pull/1218) +* Improvements + * Optimize the handling process of down stores to make up replicas as soon as possible [#1222](https://github.com/pingcap/pd/pull/1222) + * Optimize the start of Coordinator to reduce the unnecessary scheduling caused by restarting PD [#1225](https://github.com/pingcap/pd/pull/1225) + * Optimize the memory usage to reduce the overhead caused by heartbeats [#1195](https://github.com/pingcap/pd/pull/1195) + * Optimize error handling and improve the log information [#1227](https://github.com/pingcap/pd/pull/1227) + * Support querying the Region information of a specific store in pd-ctl [#1231](https://github.com/pingcap/pd/pull/1231) + * Support querying the topN Region information based on version comparasion in pd-ctl [#1233](https://github.com/pingcap/pd/pull/1233) + * Support more accurate TSO decoding in pd-ctl [#1242](https://github.com/pingcap/pd/pull/1242) +* Bug fix + * Fix the issue that pd-ctl uses the `hot store` command to exit wrongly [#1244](https://github.com/pingcap/pd/pull/1244) + +## TiKV + +* Performance + * Support splitting Regions based on statistics estimation to reduce the I/O cost [#3511](https://github.com/tikv/tikv/pull/3511) + * Reduce clone in the transaction scheduler [#3530](https://github.com/tikv/tikv/pull/3530) +* Improvements + * Add the pushdown support for a large number of built-in functions + * Add the `leader-transfer-max-log-lag` configuration to fix the failure issue of leader scheduling in specific scenarios [#3507](https://github.com/tikv/tikv/pull/3507) + * Add the `max-open-engines` configuration to limit the number of engines opened by `tikv-importer` simultaneously [#3496](https://github.com/tikv/tikv/pull/3496) + * Limit the cleanup speed of garbage data to reduce the impact on `snapshot apply` [#3547](https://github.com/tikv/tikv/pull/3547) + * Broadcast the commit message for crucial Raft messages to avoid unnecessary delay [#3592](https://github.com/tikv/tikv/pull/3592) +* Bug fixes + * Fix the leader election issue caused by discarding the `PreVote` message of the newly split Region [#3557](https://github.com/tikv/tikv/pull/3557) + * Fix follower related statistics after merging Regions [#3573](https://github.com/tikv/tikv/pull/3573) + * Fix the issue that the local reader uses obsolete Region information [#3565](https://github.com/tikv/tikv/pull/3565) \ No newline at end of file diff --git a/releases/21rc3.md b/releases/21rc3.md new file mode 100644 index 0000000000000..8c2f7929b7975 --- /dev/null +++ b/releases/21rc3.md @@ -0,0 +1,63 @@ +--- +title: TiDB 2.1 RC3 Release Notes +category: Releases +--- + +# TiDB 2.1 RC3 Release Notes + +On September 29, 2018, TiDB 2.1 RC3 is released. Compared with TiDB 2.1 RC2, this release has great improvement in stability, compatibility, SQL optimizer, and execution engine. + +## TiDB + ++ SQL Optimizer + - Fix the incorrect result issue when a statement contains embedded `LEFT OUTER JOIN` [#7689](https://github.com/pingcap/tidb/pull/7689) + - Enhance the optimization rule of predicate pushdown on the `JOIN` statement [#7645](https://github.com/pingcap/tidb/pull/7645) + - Fix the optimization rule of predicate pushdown for the `UnionScan` operator [#7695](https://github.com/pingcap/tidb/pull/7695) + - Fix the issue that the unique key property of the `Union` operator is not correctly set [#7680](https://github.com/pingcap/tidb/pull/7680) + - Enhance the optimization rule of constant folding [#7696](https://github.com/pingcap/tidb/pull/7696) + - Optimize the data source in which the filter is null after propagation to table dual [#7756](https://github.com/pingcap/tidb/pull/7756) ++ SQL Execution Engine + - Optimize the performance of read requests in a transaction [#7717](https://github.com/pingcap/tidb/pull/7717) + - Optimize the cost of allocating Chunk memory in some executors [#7540](https://github.com/pingcap/tidb/pull/7540) + - Fix the "index out of range" panic caused by the columns where point queries get all NULL values [#7790](https://github.com/pingcap/tidb/pull/7790) ++ Server + - Fix the issue that the memory quota in the configuration file does not take effect [#7729](https://github.com/pingcap/tidb/pull/7729) + - Add the `tidb_force_priority` system variable to set the execution priority for each statement [#7694](https://github.com/pingcap/tidb/pull/7694) + - Support using the `admin show slow` statement to obtain the slow query log [#7785](https://github.com/pingcap/tidb/pull/7785) ++ Compatibility + - Fix the issue that the result of `charset/collation` is incorrect in `information_schema.schemata` [#7751](https://github.com/pingcap/tidb/pull/7751) + - Fix the issue that the value of the `hostname` system variable is empty [#7750](https://github.com/pingcap/tidb/pull/7750) ++ Expressions + - Support the `init_vecter` argument in the `AES_ENCRYPT`/`AES_DECRYPT` built-in function [#7425](https://github.com/pingcap/tidb/pull/7425) + - Fix the issue that the result of `Format` is incorrect in some expressions [#7770](https://github.com/pingcap/tidb/pull/7770) + - Support the `JSON_LENGTH` built-in function [#7739](https://github.com/pingcap/tidb/pull/7739) + - Fix the incorrect result issue when casting the unsigned integer type to the decimal type [#7792](https://github.com/pingcap/tidb/pull/7792) ++ DML + - Fix the issue that the result of the `INSERT … ON DUPLICATE KEY UPDATE` statement is incorrect while updating the unique key [#7675](https://github.com/pingcap/tidb/pull/7675) ++ DDL + - Fix the issue that the index value is not converted between time zones when you create a new index on a new column of the timestamp type [#7724](https://github.com/pingcap/tidb/pull/7724) + - Support appending new values for the enum type [#7767](https://github.com/pingcap/tidb/pull/7767) + - Support creating an etcd session quickly, which improves the cluster availability after network isolation [#7774](https://github.com/pingcap/tidb/pull/7774) + +## PD + ++ New feature + - Add the API to get the Region list by size in reverse order [#1254](https://github.com/pingcap/pd/pull/1254) ++ Improvement + - Return more detailed information in the Region API [#1252](https://github.com/pingcap/pd/pull/1252) ++ Bug fix + - Fix the issue that `adjacent-region-scheduler` might lead to a crash after PD switches the leader [#1250](https://github.com/pingcap/pd/pull/1250) + +## TiKV + ++ Performance + - Optimize the concurrency for coprocessor requests [#3515](https://github.com/tikv/tikv/pull/3515) ++ New features + - Add the support for Log functions [#3603](https://github.com/tikv/tikv/pull/3603) + - Add the support for the `sha1` function [#3612](https://github.com/tikv/tikv/pull/3612) + - Add the support for the `truncate_int` function [#3532](https://github.com/tikv/tikv/pull/3532) + - Add the support for the `year` function [#3622](https://github.com/tikv/tikv/pull/3622) + - Add the support for the `truncate_real` function [#3633](https://github.com/tikv/tikv/pull/3633) ++ Bug fixes + - Fix the reporting error behavior related to time functions [#3487](https://github.com/tikv/tikv/pull/3487), [#3615](https://github.com/tikv/tikv/pull/3615) + - Fix the issue that the time parsed from string is inconsistent with that in TiDB [#3589](https://github.com/tikv/tikv/pull/3589) \ No newline at end of file diff --git a/releases/rn.md b/releases/rn.md index 111288a9ca953..349e9a03f2afd 100644 --- a/releases/rn.md +++ b/releases/rn.md @@ -4,7 +4,20 @@ category: release --- # TiDB Release Notes - + + - [2.0.8](208.md) + - [2.1 RC3](21rc3.md) + - [2.1 RC2](21rc2.md) + - [2.0.7](207.md) + - [2.1 RC1](21rc1.md) + - [2.0.6](206.md) + - [2.0.5](205.md) + - [2.1 Beta](21beta.md) + - [2.0.4](204.md) + - [2.0.3](203.md) + - [2.0.2](202.md) + - [2.0.1](201.md) + - [2.0](2.0ga.md) - [2.0 RC5](2rc5.md) - [2.0 RC4](2rc4.md) - [2.0 RC3](2rc3.md) diff --git a/scripts/check_requirement.sh b/scripts/check_requirement.sh index cae7df70de0ee..5e159e0c2aab1 100755 --- a/scripts/check_requirement.sh +++ b/scripts/check_requirement.sh @@ -25,13 +25,13 @@ function install_go { echo "Intall go ..." case "$OSTYPE" in linux*) - curl -L https://storage.googleapis.com/golang/go1.9.2.linux-amd64.tar.gz -o golang.tar.gz + curl -L https://storage.googleapis.com/golang/go1.10.2.linux-amd64.tar.gz -o golang.tar.gz ${SUDO} tar -C /usr/local -xzf golang.tar.gz rm golang.tar.gz ;; darwin*) - curl -L https://storage.googleapis.com/golang/go1.9.2.darwin-amd64.tar.gz -o golang.tar.gz + curl -L https://storage.googleapis.com/golang/go1.10.2.darwin-amd64.tar.gz -o golang.tar.gz ${SUDO} tar -C /usr/local -xzf golang.tar.gz rm golang.tar.gz ;; @@ -92,10 +92,10 @@ fi # Check go if which go &>/dev/null; then # requires go >= 1.8 - GO_VER_1=`go version | awk 'match($0, /([0-9])+(\.[0-9])+/) { ver = substr($0, RSTART, RLENGTH); split(ver, n, "."); print n[1];}'` - GO_VER_2=`go version | awk 'match($0, /([0-9])+(\.[0-9])+/) { ver = substr($0, RSTART, RLENGTH); split(ver, n, "."); print n[2];}'` - if [[ (($GO_VER_1 -eq 1 && $GO_VER_2 -lt 8)) || (($GO_VER_1 -lt 1)) ]]; then - echo "Please upgrade Go to 1.8 or later." + GO_VER_1=`go version | awk 'match($0, /([0-9])+(\.[0-9]+)+/) { ver = substr($0, RSTART, RLENGTH); split(ver, n, "."); print n[1];}'` + GO_VER_2=`go version | awk 'match($0, /([0-9])+(\.[0-9]+)+/) { ver = substr($0, RSTART, RLENGTH); split(ver, n, "."); print n[2];}'` + if [[ (($GO_VER_1 -eq 1 && $GO_VER_2 -lt 10)) || (($GO_VER_1 -lt 1)) ]]; then + echo "Please upgrade Go to 1.10 or later." exit 1 fi else @@ -115,4 +115,4 @@ else install_gpp fi -echo OK \ No newline at end of file +echo OK diff --git a/scripts/generate_pdf.sh b/scripts/generate_pdf.sh index 697068d9f6ccd..6e4caa8a092b2 100755 --- a/scripts/generate_pdf.sh +++ b/scripts/generate_pdf.sh @@ -13,7 +13,7 @@ MONOFONT="WenQuanYi Micro Hei Mono" _version_tag="$(date '+%Y%m%d')" # default version: `pandoc --latex-engine=xelatex doc.md -s -o output2.pdf` -# used to debug template settting error +# used to debug template setting error pandoc -N --toc --smart --latex-engine=xelatex \ --template=templates/template.tex \ @@ -21,7 +21,7 @@ pandoc -N --toc --smart --latex-engine=xelatex \ --listings \ -V title="TiDB Documentation" \ -V author="PingCAP Inc." \ - -V date="v1.0.0\$\sim\$${_version_tag}" \ + -V date="${_version_tag}" \ -V CJKmainfont="${MAINFONT}" \ -V fontsize=12pt \ -V geometry:margin=1in \ diff --git a/sql/admin.md b/sql/admin.md index 5fd8a01bf627a..012cd8ef31eea 100644 --- a/sql/admin.md +++ b/sql/admin.md @@ -1,5 +1,6 @@ --- title: Database Administration Statements +summary: Use administration statements to manage the TiDB database. category: user guide --- @@ -9,7 +10,7 @@ TiDB manages the database using a number of statements, including granting privi ## Privilege management -See [Privilege Management](privilege.md). +See [Privilege Management](../sql/privilege.md). ## `SET` statement @@ -30,7 +31,7 @@ variable_assignment: system_var_name = expr ``` -You can use the above syntax to assign values to variables in TiDB, which include system variables and user-defined variables. All user-defined variables are session variables. The system variables set using `@@global.` or `GLOBAL` are global variables, otherwise session variables. For more information, see [The System Variables](variable.md). +You can use the above syntax to assign values to variables in TiDB, which include system variables and user-defined variables. All user-defined variables are session variables. The system variables set using `@@global.` or `GLOBAL` are global variables, otherwise session variables. For more information, see [The System Variables](../sql/variable.md). ### `SET CHARACTER` statement and `SET NAMES` @@ -55,7 +56,7 @@ password_option: { } ``` -This statement is used to set user passwords. For more information, see [Privilege Management](privilege.md). +This statement is used to set user passwords. For more information, see [Privilege Management](../sql/privilege.md). ### Set the isolation level @@ -63,7 +64,7 @@ This statement is used to set user passwords. For more information, see [Privile SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED; ``` -This statement is used to set the transaction isolation level. For more information, see [Transaction Isolation Level](transaction.md#transaction-isolation-level). +This statement is used to set the transaction isolation level. For more information, see [Transaction Isolation Level](../sql/transaction.md#transaction-isolation-level). ## `SHOW` statement @@ -109,7 +110,7 @@ like_or_where: > **Note**: > -> - To view statistics using the `SHOW` statement, see [View Statistics](statistics.md#view-statistics). +> - To view statistics using the `SHOW` statement, see [View Statistics](../sql/statistics.md#view-statistics). > - For more information about the `SHOW` statement, see [SHOW Syntax in MySQL](https://dev.mysql.com/doc/refman/5.7/en/show.html). ## `ADMIN` statement @@ -119,14 +120,14 @@ This statement is a TiDB extension syntax, used to view the status of TiDB. ```sql ADMIN SHOW DDL ADMIN SHOW DDL JOBS -ADMIN SHOW DDL JOB QUERIES 'job_id' [, 'job_id'] ... -ADMIN CANCEL DDL JOBS 'job_id' [, 'job_id'] ... +ADMIN SHOW DDL JOB QUERIES job_id [, job_id] ... +ADMIN CANCEL DDL JOBS job_id [, job_id] ... ``` - `ADMIN SHOW DDL`: To view the currently running DDL jobs. - `ADMIN SHOW DDL JOBS`: To view all the results in the current DDL job queue (including tasks that are running and waiting to be run) and the last ten results in the completed DDL job queue. -- `ADMIN SHOW DDL JOB QUERIES 'job_id' [, 'job_id'] ...`: To view the original SQL statement of the DDL task corresponding to the `job_id`; the `job_id` only searches the running DDL job and the last ten results in the DDL history job queue -- `ADMIN CANCEL DDL JOBS 'job_id' [, 'job_id'] ...`: To cancel the currently running DDL jobs and return whether the corresponding jobs are successfully cancelled. If the operation fails to cancel the jobs, specific reasons are displayed. +- `ADMIN SHOW DDL JOB QUERIES job_id [, job_id] ...`: To view the original SQL statement of the DDL task corresponding to the `job_id`; the `job_id` only searches the running DDL job and the last ten results in the DDL history job queue +- `ADMIN CANCEL DDL JOBS job_id [, job_id] ...`: To cancel the currently running DDL jobs and return whether the corresponding jobs are successfully cancelled. If the operation fails to cancel the jobs, specific reasons are displayed. > **Note**: > diff --git a/sql/aggregate-group-by-functions.md b/sql/aggregate-group-by-functions.md index 326f36efcddc1..fd905ea73a154 100644 --- a/sql/aggregate-group-by-functions.md +++ b/sql/aggregate-group-by-functions.md @@ -1,10 +1,13 @@ --- title: Aggregate (GROUP BY) Functions +summary: Learn about the supported aggregate functions in TiDB. category: user guide --- # Aggregate (GROUP BY) Functions +This document describes details about the supported aggregate functions in TiDB. + ## Aggregate (GROUP BY) function descriptions This section describes the supported MySQL group (aggregate) functions in TiDB. @@ -39,7 +42,7 @@ insert into t values(1, 2, 3), (2, 2, 3), (3, 2, 3); select a, b, sum(c) from t group by a; ``` -The preceding query is legal in TiDB. TiDB does not support SQL mode `ONLY_FULL_GROUP_BY` currently. We'll do it in the future. For more inmormation, see [#4248](https://github.com/pingcap/tidb/issues/4248). +The preceding query is legal in TiDB. TiDB does not support SQL mode `ONLY_FULL_GROUP_BY` currently. We'll do it in the future. For more information, see [#4248](https://github.com/pingcap/tidb/issues/4248). Suppose that we execute the following query, expecting the results to be ordered by "c": ```sql @@ -52,12 +55,14 @@ select distinct a, b from t order by c; To order the result, duplicates must be eliminated first. But to do so, which row should we keep? This choice influences the retained value of "c", which in turn influences ordering and makes it arbitrary as well. In MySQL, a query that has `DISTINCT` and `ORDER BY` is rejected as invalid if any `ORDER BY` expression does not satisfy at least one of these conditions: + - The expression is equal to one in the `SELECT` list - All columns referenced by the expression and belonging to the query's selected tables are elements of the `SELECT` list But in TiDB, the above query is legal, for more information see [#4254](https://github.com/pingcap/tidb/issues/4254). Another TiDB extension to standard SQL permits references in the `HAVING` clause to aliased expressions in the `SELECT` list. For example, the following query returns "name" values that occur only once in table "orders": + ```sql select name, count(name) from orders group by name @@ -65,6 +70,7 @@ having count(name) = 1; ``` The TiDB extension permits the use of an alias in the `HAVING` clause for the aggregated column: + ```sql select name, count(name) as c from orders group by name @@ -72,6 +78,7 @@ having c = 1; ``` Standard SQL permits only column expressions in `GROUP BY` clauses, so a statement such as this is invalid because "FLOOR(value/100)" is a noncolumn expression: + ```sql select id, floor(value/100) from tbl_name @@ -81,6 +88,7 @@ group by id, floor(value/100); TiDB extends standard SQL to permit noncolumn expressions in `GROUP BY` clauses and considers the preceding statement valid. Standard SQL also does not permit aliases in `GROUP BY` clauses. TiDB extends standard SQL to permit aliases, so another way to write the query is as follows: + ```sql select id, floor(value/100) as val from tbl_name diff --git a/sql/bit-functions-and-operators.md b/sql/bit-functions-and-operators.md index 6f6fe044c03df..ae91676fe0568 100644 --- a/sql/bit-functions-and-operators.md +++ b/sql/bit-functions-and-operators.md @@ -1,5 +1,6 @@ --- title: Bit Functions and Operators +summary: Learn about the bit functions and operators. category: user guide --- diff --git a/sql/cast-functions-and-operators.md b/sql/cast-functions-and-operators.md index 67d387314df85..092da0003c587 100644 --- a/sql/cast-functions-and-operators.md +++ b/sql/cast-functions-and-operators.md @@ -1,5 +1,6 @@ --- title: Cast Functions and Operators +summary: Learn about the cast functions and operators. category: user guide --- diff --git a/sql/character-set-configuration.md b/sql/character-set-configuration.md index 8313150e7c330..46b86257d5232 100644 --- a/sql/character-set-configuration.md +++ b/sql/character-set-configuration.md @@ -1,10 +1,11 @@ --- title: Character Set Configuration +summary: Learn about the character set configuration. category: user guide --- # Character Set Configuration -Currently, TiDB does not support configuring the character set. The default character set is utf8. +Currently, TiDB only supports the `utf8` character set, which is the equivalent to `utf8mb4` in MySQL. Since MySQL 5.7 defaults to `latin1`, this difference is documented under [default differences](../sql/mysql-compatibility.md#default-differences) between TiDB and MySQL. -For more information, see [Character Set Configuration in MySQL](https://dev.mysql.com/doc/refman/5.7/en/charset-configuration.html). \ No newline at end of file +For more information, see [Character Set Configuration in MySQL](https://dev.mysql.com/doc/refman/5.7/en/charset-configuration.html). diff --git a/sql/character-set-support.md b/sql/character-set-support.md index 5b663d95327d7..c557f5855e076 100644 --- a/sql/character-set-support.md +++ b/sql/character-set-support.md @@ -1,5 +1,6 @@ --- title: Character Set Support +summary: Learn about the supported character sets in TiDB. category: user guide --- @@ -72,12 +73,12 @@ The collation names in TiDB follow these conventions: | Suffix | Meaning | |:-------|:-------------------| | \_ai | Accent insensitive | - | \_as | Accent insensitive | + | \_as | Accent sensitive | | \_ci | Case insensitive | | \_cs | Case sensitive | | \_bin | Binary | -> **Note**: For now, TiDB supports on some of the collations in the above table. +> **Note**: Currently, TiDB only supports some of the collations in the above table. ## Database character set and collation diff --git a/sql/comment-syntax.md b/sql/comment-syntax.md index aaea4f4b091a6..4b3852415f429 100644 --- a/sql/comment-syntax.md +++ b/sql/comment-syntax.md @@ -1,5 +1,6 @@ --- title: Comment Syntax +summary: Learn about the three comment styles in TiDB. category: user guide --- @@ -13,7 +14,7 @@ TiDB supports three comment styles: Example: -``` +```sql mysql> SELECT 1+1; # This comment continues to the end of line +------+ | 1+1 | @@ -72,13 +73,13 @@ In this comment style, TiDB runs the statements in the comment. The syntax is us For example: -``` +```sql SELECT /*! STRAIGHT_JOIN */ col1 FROM table1,table2 WHERE ... ``` In TiDB, you can also use another version: -``` +```sql SELECT STRAIGHT_JOIN col1 FROM table1,table2 WHERE ... ``` @@ -96,6 +97,6 @@ Since Hint is involved in comments like `/*+ xxx */`, the MySQL client clears th mysql -h 127.0.0.1 -P 4000 -uroot --comments ``` -For details about the optimizer hints that TiDB supports, see [Optimizer hint](tidb-specific.md#optimizer-hint). +For details about the optimizer hints that TiDB supports, see [Optimizer hint](../sql/tidb-specific.md#optimizer-hint). For more information, see [Comment Syntax](https://dev.mysql.com/doc/refman/5.7/en/comments.html). diff --git a/sql/connection-and-APIs.md b/sql/connection-and-APIs.md index d7bbaaea4e367..2e147a18b3dd6 100644 --- a/sql/connection-and-APIs.md +++ b/sql/connection-and-APIs.md @@ -1,5 +1,6 @@ --- -title: Connectors and APIs +title: Connectors and APIs +summary: Learn about the connectors and APIs. category: user guide --- diff --git a/sql/control-flow-functions.md b/sql/control-flow-functions.md index d913dd3bbb95b..9a27ece2977e8 100644 --- a/sql/control-flow-functions.md +++ b/sql/control-flow-functions.md @@ -1,5 +1,6 @@ --- title: Control Flow Functions +summary: Learn about the Control Flow functions. category: user guide --- diff --git a/sql/datatype.md b/sql/datatype.md index 5cd05aed5b03b..3f00c1ff7d038 100644 --- a/sql/datatype.md +++ b/sql/datatype.md @@ -1,12 +1,11 @@ --- title: TiDB Data Type +summary: Learn about the data types supported in TiDB. category: user guide --- # TiDB Data Type -## Overview - TiDB supports all the data types in MySQL except the Spatial type, including numeric type, string type, date & time type, and JSON type. The definition of the data type is: `T(M[, D])`. In this format: @@ -59,13 +58,13 @@ INTEGER[(M)] [UNSIGNED] [ZEROFILL] BIGINT[(M)] [UNSIGNED] [ZEROFILL] > BIGINT. The signed range is: [-9223372036854775808, 9223372036854775807], and the unsigned range is [0, 18446744073709551615]. - ``` + The meaning of the fields: | Syntax Element | Description | | -------- | ------------------------------- | -| M | the length of the type. Optional. | +| M | the display width of the type. Optional. | | UNSIGNED | UNSIGNED. If omitted, it is SIGNED. | | ZEROFILL | If you specify ZEROFILL for a numeric column, TiDB automatically adds the UNSIGNED attribute to the column. | @@ -101,7 +100,6 @@ DOUBLE PRECISION [(M,D)] [UNSIGNED] [ZEROFILL], REAL[(M,D)] [UNSIGNED] [ZEROFILL FLOAT(p) [UNSIGNED] [ZEROFILL] > A floating-point number. p represents the precision in bits, but TiDB uses this value only to determine whether to use FLOAT or DOUBLE for the resulting data type. If p is from 0 to 24, the data type becomes FLOAT with no M or D values. If p is from 25 to 53, the data type becomes DOUBLE with no M or D values. The range of the resulting column is the same as for the single-precision FLOAT or double-precision DOUBLE data types described earlier in this section. - ``` The meaning of the fields: @@ -176,8 +174,8 @@ TIME[(fsp)] > A time. The range is '-838:59:59.000000' to '838:59:59.000000'. TiDB displays TIME values in 'HH:MM:SS[.fraction]' format. An optional fsp value in the range from 0 to 6 may be given to specify fractional seconds precision. If omitted, the default precision is 0. -YEAR[(2|4)] -> A year in two-digit or four-digit format. The default is the four-digit format. In four-digit format, values display as 1901 to 2155, and 0000. In two-digit format, values display as 70 to 69, representing years from 1970 to 2069. +YEAR[(4)] +> A year in four-digit format. Values display as 1901 to 2155, and 0000. ``` @@ -255,7 +253,7 @@ INSERT INTO city VALUES (1, '{"name": "Beijing", "population": 100}'); SELECT id FROM city WHERE population >= 100; ``` -For more information, see [JSON Functions and Generated Column](json-functions-generated-column.md). +For more information, see [JSON Functions and Generated Column](../sql/json-functions-generated-column.md). ## The ENUM data type @@ -309,7 +307,7 @@ In TiDB, the values of the SET type is internally converted to Int64. The existe In this case, for an element of `('a', 'c')`, it is 0101 in binary. -For more information, see [the SET type in MySQL](https://dev.mysql.com/doc/refman/5.7/en/set.html)。 +For more information, see [the SET type in MySQL](https://dev.mysql.com/doc/refman/5.7/en/set.html). ## Data type default values @@ -331,4 +329,4 @@ Implicit defaults are defined as follows: - For numeric types, the default is 0. If declared with the AUTO_INCREMENT attribute, the default is the next value in the sequence. - For date and time types other than TIMESTAMP, the default is the appropriate “zero” value for the type. For TIMESTAMP, the default value is the current date and time. -- For string types other than ENUM, the default value is the empty string. For ENUM, the default is the first enumeration value. \ No newline at end of file +- For string types other than ENUM, the default value is the empty string. For ENUM, the default is the first enumeration value. diff --git a/sql/date-and-time-functions.md b/sql/date-and-time-functions.md index 7bb87cd5a5ef7..5e3685dc43eb7 100644 --- a/sql/date-and-time-functions.md +++ b/sql/date-and-time-functions.md @@ -1,5 +1,6 @@ --- title: Date and Time Functions +summary: Learn how to use the data and time functions. category: user guide --- diff --git a/sql/ddl.md b/sql/ddl.md index f8337e1a4a3ee..a3c86e7e4a210 100644 --- a/sql/ddl.md +++ b/sql/ddl.md @@ -1,5 +1,6 @@ --- title: Data Definition Statements +summary: Learn how to use DDL (Data Definition Language) in TiDB. category: user guide --- @@ -155,7 +156,7 @@ The `CREATE TABLE` statement is used to create a table. Currently, it does not s - When you create an existing table and if you specify `IF NOT EXIST`, it does not report an error. Otherwise, it reports an error. - Use `LIKE` to create an empty table based on the definition of another table including its column and index properties. - The `FULLTEXT` and `FOREIGN KEY` in `create_definition` are currently only supported in syntax. -- For the `data_type`, see [Data Types](datatype.md). +- For the `data_type`, see [Data Types](../sql/datatype.md). - The `[ASC | DESC]` in `index_col_name` is currently only supported in syntax. - The `index_type` is currently only supported in syntax. - The `KEY_BLOCK_SIZE` in `index_option` is currently only supported in syntax. @@ -285,7 +286,7 @@ table_option: The `ALTER TABLE` statement is used to update the structure of an existing table, such as updating the table or table properties, adding or deleting columns, creating or deleting indexes, updating columns or column properties. The descriptions of several field types are as follows: - For `index_col_name`, `index_type`, and `index_option`, see [CREATE INDEX Syntax](#create-index-syntax). -- Currently, the `table_option` is only supported in syntax. +- Currently, the `table_option` supports `AUTO_INCREMENT` and `COMMENT`, while the others are only supported in syntax. The support for specific operation types is as follows: @@ -294,7 +295,7 @@ The support for specific operation types is as follows: - `DROP COLUMN`: currently does not support the deletion of columns that are primary key columns or index columns - `ADD COLUMN`: currently, does not support setting the newly added column as the primary key or unique index at the same time, and does not support setting the column property to `AUTO_INCREMENT` - `CHANGE/MODIFY COLUMN`: currently supports some of the syntaxes, and the details are as follows: - - In updating data types, the `CHANGE/MODIFY COLUMN` only supports updates between integer types, updates between string types, and updates between Blob types. You can only extend the length of the original type. Besides, the column properties of `unsigned`/`charset`/`collate` cannot be changed. The specific supported types are classified as follows: + - In updating data types, the `CHANGE/MODIFY COLUMN` only supports updates between integer types, updates between string types, and updates between Blob types. You can only extend the length of the original type. The column properties of `unsigned`/`charset`/`collate` cannot be changed. The specific supported types are classified as follows: - Integer types: `TinyInt`, `SmallInt`, `MediumInt`, `Int`, `BigInt` - String types: `Char`, `Varchar`, `Text`, `TinyText`, `MediumText`, `LongText` - Blob types: `Blob`, `TinyBlob`, `MediumBlob`, `LongBlob` @@ -327,10 +328,10 @@ The `CREATE INDEX` statement is used to create the index for an existing table. ### Difference from MySQL - The `CREATE INDEX` supports the `UNIQUE` index and does not support `FULLTEXT` and `SPATIAL` indexes. -- The `index_col_name` supports the length option with a maximum length limit of 3072 bytes. The length limit does not change depending on the storage engine, and character set used when building the table. This is because TiDB does not use storage engines like InnoDB and MyISAM, and only provides syntax compatibility with MySQL for the storage engine options when creating tables. Similarly, TiDB uses the utf8mb4 character set, and only provides syntax compatibility with MySQL for the character set options when creating tables. For more information, see [Compatibility with MySQL](mysql-compatibility.md). +- The `index_col_name` supports the length option with a maximum length limit of 3072 bytes. The length limit does not change depending on the storage engine, and character set used when building the table. This is because TiDB does not use storage engines like InnoDB and MyISAM, and only provides syntax compatibility with MySQL for the storage engine options when creating tables. Similarly, TiDB uses the utf8mb4 character set, and only provides syntax compatibility with MySQL for the character set options when creating tables. For more information, see [Compatibility with MySQL](../sql/mysql-compatibility.md). - The `index_col_name` supports the index sorting options of `ASC` and `DESC`. The behavior of sorting options is similar to MySQL, and only syntax parsing is supported. All the internal indexes are stored in ascending order. For more information, see [CREATE INDEX Syntax](https://dev.mysql.com/doc/refman/5.7/en/create-index.html). - The `index_option` supports `KEY_BLOCK_SIZE`, `index_type` and `COMMENT`. The `COMMENT` supports a maximum of 1024 characters and does not support the `WITH PARSER` option. -- The `index_type` supports `BTREE` and `HASH` only in MySQL syntax, which means the index type is independent of the storage engine option in the creating table statement. For example, in MySQL, when you use `CREATE INDEX` on a table using InnoDB, it only supports the `BTREE` index, while TiDB supports both `BTREE` and `HASH` indexes. +- The `index_type` supports `BTREE` and `HASH` only in MySQL syntax, which means the index type is independent of the storage engine option in the creating table statement. For example, in MySQL, when you use `CREATE INDEX` on a table using InnoDB, it only supports the `BTREE` index, while TiDB supports both `BTREE` and `HASH` indexes. - TiDB supports `algorithm_option` and `lock_option` only in MySQL syntax. - TiDB supports at most 512 columns in a single table. The corresponding number limit in InnoDB is 1017, and the hard limit in MySQL is 4096. For more details, see [Limits on Table Column Count and Row Size](https://dev.mysql.com/doc/refman/5.7/en/column-count-limit.html). @@ -344,4 +345,4 @@ The `DROP INDEX` statement is used to delete a table index. Currently, it does n ## ADMIN statement -You can use the `ADMIN` statement to view the information related to DDL job. For details, see [here](admin.md#admin-statement). \ No newline at end of file +You can use the `ADMIN` statement to view the information related to DDL job. For details, see [here](../sql/admin.md#admin-statement). diff --git a/sql/dml.md b/sql/dml.md index d50335e87672d..d7ff8a51243b5 100644 --- a/sql/dml.md +++ b/sql/dml.md @@ -1,5 +1,6 @@ --- title: TiDB Data Manipulation Language +summary: Use DML (Data Manipulation Language) to select, insert, delete and update the data. category: user guide --- @@ -40,13 +41,13 @@ SELECT |`SQL_CACHE`, `SQL_NO_CACHE`, `SQL_CALC_FOUND_ROWS` | To guarantee compatibility with MySQL, TiDB parses these three modifiers, but will ignore them.| | `STRAIGHT_JOIN` | `STRAIGHT_JOIN` forces the optimizer to execute a Join query in the order of the tables used in the `FROM` clause. You can use this syntax to speed up queries execution when the Join order chosen by the optimizer is not good. | |`select_expr` | Each `select_expr` indicates a column to retrieve. including the column names and expressions. `\*` represents all the columns.| -\|`FROM table_references` | The `FROM table_references` clause indicates the table (such as `(select * from t;)`) , or tables(such as `select * from t1 join t2;)') or even 0 tables (such as `select 1+1 from dual;` (which is equivalent to `select 1+1;')) from which to retrieve rows.| +\|`FROM table_references` | The `FROM table_references` clause indicates the table (such as `(select * from t;)`), or tables (such as `select * from t1 join t2;)`) or even 0 tables (such as `select 1+1 from dual;` (which is equivalent to `select 1+1;')) from which to retrieve rows.| |`WHERE where_condition` | The `WHERE` clause, if given, indicates the condition or conditions that rows must satisfy to be selected. The result contains only the data that meets the condition(s).| |`GROUP BY` | The `GROUP BY` statement is used to group the result-set.| -|`HAVING where_condition` |The `HAVING` clause and the `WHERE` clause are both used to filter the results. The `HAVING` clause filters the results of `GROUP BY`, while the `WHERE` clause filter the results before aggregation。| +|`HAVING where_condition` |The `HAVING` clause and the `WHERE` clause are both used to filter the results. The `HAVING` clause filters the results of `GROUP BY`, while the `WHERE` clause filter the results before aggregation | |`ORDER BY` | The `ORDER BY` clause is used to sort the data in ascending or descending order, based on columns, expressions or items in the `select_expr` list.| |`LIMIT` | The `LIMIT` clause can be used to constrain the number of rows. `LIMIT` takes one or two numeric arguments. With one argument, the argument specifies the maximum number of rows to return, the first row to return is the first row of the table by default; with two arguments, the first argument specifies the offset of the first row to return, and the second specifies the maximum number of rows to return.| -|`FOR UPDATE` | All the data in the result sets are read-locked, in order to detect the concurrent updates. TiDB uses the [Optimistic Transaction Model](mysql-compatibility.md#transaction). The transaction conflicts are detected in the commit phase instead of statement execution phase. while executing the `SELECT FOR UPDATE` statement, if there are other transactions trying to update relavant data, the `SELECT FOR UPDATE` transaction will fail.| +|`FOR UPDATE` | All the data in the result sets are read-locked, in order to detect the concurrent updates. TiDB uses the [Optimistic Transaction Model](../sql/mysql-compatibility.md#transaction). The transaction conflicts are detected in the commit phase instead of statement execution phase. while executing the `SELECT FOR UPDATE` statement, if there are other transactions trying to update relevant data, the `SELECT FOR UPDATE` transaction will fail.| |`LOCK IN SHARE MODE` | To guarantee compatibility, TiDB parses these three modifiers, but will ignore them.| ## INSERT @@ -130,6 +131,7 @@ You can use the following ways to specify the data set: - Select Statement The data set to be inserted is obtained using a `SELECT` statement. The column to be inserted into is obtained from the Schema in the `SELECT` statement. + ```sql CREATE TABLE tbl_name1 ( a int, @@ -138,6 +140,7 @@ You can use the following ways to specify the data set: ); INSERT INTO tbl_name SELECT * from tbl_name1; ``` + In the example above, the data is selected from `tal_name1`, and then inserted into `tbl_name`. ## DELETE @@ -148,7 +151,7 @@ You can use the following ways to specify the data set: The `Single_Table DELETE` syntax deletes rows from a single table. -### DELETE syntax +### DELETE syntax ```sql DELETE [LOW_PRIORITY] [QUICK] [IGNORE] FROM tbl_name @@ -213,7 +216,7 @@ assignment_list: For the single-table syntax, the `UPDATE` statement updates columns of existing rows in the named table with new values. The `SET assignment_list` clause indicates which columns to modify and the values they should be given. The `WHERE/Orderby/Limit` clause, if given, specifies the conditions that identify which rows to update. -### Multi-table UPDATE +### Multi-table UPDATE ```sql UPDATE [LOW_PRIORITY] [IGNORE] table_references diff --git a/sql/encrypted-connections.md b/sql/encrypted-connections.md index 7ee38af1e5b96..5c0e93ff97163 100644 --- a/sql/encrypted-connections.md +++ b/sql/encrypted-connections.md @@ -1,5 +1,6 @@ --- title: Use Encrypted Connections +summary: Use the encrypted connection to ensure data security. category: user guide --- @@ -26,9 +27,9 @@ In short, to use encrypted connections, both of the following conditions must be See the following desrciptions about the related parameters to enable encrypted connections: -- [`ssl-cert`](server-command-option.md#ssl-cert): specifies the file path of the SSL certificate -- [`ssl-key`](server-command-option.md#ssl-key): specifies the private key that matches the certificate -- [`ssl-ca`](server-command-option.md#ssl-ca): (optional) specifies the file path of the trusted CA certificate +- [`ssl-cert`](../sql/server-command-option.md#ssl-cert): specifies the file path of the SSL certificate +- [`ssl-key`](../sql/server-command-option.md#ssl-key): specifies the private key that matches the certificate +- [`ssl-ca`](../sql/server-command-option.md#ssl-ca): (optional) specifies the file path of the trusted CA certificate To enable encrypted connections in the TiDB server, you must specify both of the `ssl-cert` and `ssl-key` parameters in the configuration file when you start the TiDB server. You can also specify the `ssl-ca` parameter for client authentication (see [Enable authentication](#enable-authentication)). @@ -81,7 +82,7 @@ For more information, see [Client-Side Configuration for Encrypted Connections]( If the `ssl-ca` parameter is not specified in the TiDB server or MySQL client, the client or the server does not perform authentication by default and cannot prevent man-in-the-middle attack. For example, the client might "securely" connect to a disguised client. You can configure the `ssl-ca` parameter for authentication in the server and client. Generally, you only need to authenticate the server, but you can also authenticate the client to further enhance the security. + To authenticate the TiDB server from the MySQL client: - 1. Specify the `ssl-cert` and` ssl-key` parameters in the TiDB server. + 1. Specify the `ssl-cert` and `ssl-key` parameters in the TiDB server. 2. Specify the `--ssl-ca` parameter in the MySQL client. 3. Specify the `--ssl-mode` to `VERIFY_IDENTITY` in the MySQL client. 4. Make sure that the certificate (`ssl-cert`) configured by the TiDB server is signed by the CA specified by the client `--ssl-ca` parameter, otherwise the authentication fails. @@ -110,7 +111,7 @@ mysql> SHOW STATUS LIKE "%Ssl%"; ...... ``` -Besides, for the official MySQL client, you can also use the `STATUS` or `\s` statement to view the connection status: +For the official MySQL client, you can also use the `STATUS` or `\s` statement to view the connection status: ``` mysql> \s diff --git a/sql/encryption-and-compression-functions.md b/sql/encryption-and-compression-functions.md index fc76113c31c3f..5dca717edf62b 100644 --- a/sql/encryption-and-compression-functions.md +++ b/sql/encryption-and-compression-functions.md @@ -1,5 +1,6 @@ --- title: Encryption and Compression Functions +summary: Learn about the encryption and compression functions. category: user guide --- @@ -8,7 +9,7 @@ category: user guide | Name | Description | |:------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------| | [`MD5()`](https://dev.mysql.com/doc/refman/5.7/en/encryption-functions.html#function_md5) | Calculate MD5 checksum | -| [`PASSWORD()`](https://dev.mysql.com/doc/refman/5.7/en/encryption-functions.html#function_password) (deprecated 5.7.6) | Calculate and return a password string | +| [`PASSWORD()`](https://dev.mysql.com/doc/refman/5.7/en/encryption-functions.html#function_password) | Calculate and return a password string | | [`RANDOM_BYTES()`](https://dev.mysql.com/doc/refman/5.7/en/encryption-functions.html#function_random-bytes) | Return a random byte vector | | [`SHA1(), SHA()`](https://dev.mysql.com/doc/refman/5.7/en/encryption-functions.html#function_sha1) | Calculate an SHA-1 160-bit checksum | | [`SHA2()`](https://dev.mysql.com/doc/refman/5.7/en/encryption-functions.html#function_sha2) | Calculate an SHA-2 checksum | diff --git a/sql/error.md b/sql/error.md index dd756016042cb..4fe4c56a53548 100644 --- a/sql/error.md +++ b/sql/error.md @@ -1,5 +1,6 @@ --- title: Error Codes and Troubleshooting +summary: Learn about the error codes and solutions in TiDB. category: user guide --- @@ -13,6 +14,9 @@ TiDB is compatible with the error codes in MySQL, and in most cases returns the | Error code | Description | Solution | | ---- | ------- | --------- | +| 8001 | The memory used by the request exceeds the threshold limit for the TiDB memory usage. | Increase the value of the system variable with the `tidb_mem_quota` prefix. | +| 8002 | To guarantee consistency, a transaction with the `SELECT FOR UPDATE` statement cannot be retried when it encounters a commit conflict. TiDB rolls back the transaction and returns this error. | Retry the failed transaction. | +| 8003 | If the data in a row is not consistent with the index when executing the `ADMIN CHECK TABLE` command, TiDB returns this error. | | 9001 | The PD request timed out. | Check the state/monitor/log of the PD server and the network between the TiDB server and the PD server. | | 9002 | The TiKV request timed out. | Check the state/monitor/log of the TiKV server and the network between the TiDB server and the TiKV server. | | 9003 | The TiKV server is busy and this usually occurs when the workload is too high. | Check the state/monitor/log of the TiKV server. | @@ -23,4 +27,4 @@ TiDB is compatible with the error codes in MySQL, and in most cases returns the ## Troubleshooting -See the [troubleshooting](../trouble-shooting.md) and [FAQ](../FAQ.md) documents. \ No newline at end of file +See the [troubleshooting](../trouble-shooting.md) and [FAQ](../FAQ.md) documents. diff --git a/sql/expression-syntax.md b/sql/expression-syntax.md index b8764683309e9..4d58c2d49f096 100644 --- a/sql/expression-syntax.md +++ b/sql/expression-syntax.md @@ -1,5 +1,6 @@ --- title: Expression Syntax +summary: Learn about the expression syntax in TiDB. category: user guide --- diff --git a/sql/functions-and-operators-reference.md b/sql/functions-and-operators-reference.md index 1851b6b2b9c5c..c5cc0f6910e5e 100644 --- a/sql/functions-and-operators-reference.md +++ b/sql/functions-and-operators-reference.md @@ -1,5 +1,6 @@ --- title: Function and Operator Reference +summary: Learn how to use the functions and operators. category: user guide --- diff --git a/sql/generated-columns.md b/sql/generated-columns.md new file mode 100644 index 0000000000000..0c4481189ccf1 --- /dev/null +++ b/sql/generated-columns.md @@ -0,0 +1,69 @@ +--- +title: Generated Columns +summary: Learn how to use generated columns +category: user guide +--- + +# Generated Columns + +TiDB supports generated columns as part of MySQL 5.7 compatibility. One of the primary use cases for generated columns is to extract data out of a JSON data type and enable it to be indexed. + +## Index JSON using generated column + +In both MySQL 5.7 and TiDB, columns of type JSON can not be indexed directly. i.e. The following table structure is **not supported**: + +```sql +CREATE TABLE person ( + id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, + name VARCHAR(255) NOT NULL, + address_info JSON, + KEY (address_info) +); +``` + +In order to index a JSON column, you must first extract it as a generated column. Using the `city` generated column as an example, you are then able to add an index: + +```sql +CREATE TABLE person ( + id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, + name VARCHAR(255) NOT NULL, + address_info JSON, + city VARCHAR(64) AS (JSON_UNQUOTE(JSON_EXTRACT(address_info, '$.city'))) VIRTUAL, + KEY (city) +); +``` + +In this table, the `city` column is a **generated column**. As the name implies, the column is generated from other columns in the table, and cannot be assigned a value when inserted or updated. The column is also _virtual_ in that it does not require any storage or memory, and is generated on demand. The index on `city` however is _stored_ and uses the same structure as other indexes of the type `varchar(64)`. + +You can use the index on the generated column in order to speed up the following statement: + +```sql +SELECT name, id FROM person WHERE city = 'Beijing'; +``` + +If no data exists at path `$.city`, `JSON_EXTRACT` returns `NULL`. If you want to enforce a constraint that `city` must be `NOT NULL`, you can define the virtual column as follows: + +```sql +CREATE TABLE person ( + id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, + name VARCHAR(255) NOT NULL, + address_info JSON, + city VARCHAR(64) AS (JSON_UNQUOTE(JSON_EXTRACT(address_info, '$.city'))) VIRTUAL NOT NULL, + KEY (city) +); +``` + +Both `INSERT` and `UPDATE` statements check virtual column definitions. Rows that do not pass validation return errors: + +```sql +mysql> INSERT INTO person (name, address_info) VALUES ('Morgan', JSON_OBJECT('Country', 'Canada')); +ERROR 1048 (23000): Column 'city' cannot be null +``` + +## Limitations + +The current limitations of JSON and generated columns are as follows: + +- You cannot add the generated column in the storage type of `STORED` through `ALTER TABLE`. +- You cannot create an index on the generated column through `ALTER TABLE`. +- Not all [JSON functions](../sql/json-functions.md) are supported. diff --git a/sql/information-functions.md b/sql/information-functions.md index e40825244a53b..308c1aed8ba91 100644 --- a/sql/information-functions.md +++ b/sql/information-functions.md @@ -1,5 +1,6 @@ --- title: Information Functions +summary: Learn about the information functions. category: user guide --- @@ -21,4 +22,4 @@ In TiDB, the usage of information functions is similar to MySQL. For more inform | [`SYSTEM_USER()`](https://dev.mysql.com/doc/refman/5.7/en/information-functions.html#function_system-user) | Synonym for `USER()` | | [`USER()`](https://dev.mysql.com/doc/refman/5.7/en/information-functions.html#function_user) | Return the user name and host name provided by the client | | [`VERSION()`](https://dev.mysql.com/doc/refman/5.7/en/information-functions.html#function_version) | Return a string that indicates the MySQL server version | -| `TIDB_VERSION` | Return a string that indicates the TiDB server version | +| `TIDB_VERSION()` | Return a string that indicates the TiDB server version | diff --git a/sql/json-functions-generated-column.md b/sql/json-functions-generated-column.md deleted file mode 100644 index 9eaaac5f45267..0000000000000 --- a/sql/json-functions-generated-column.md +++ /dev/null @@ -1,117 +0,0 @@ ---- -title: JSON Functions and Generated Column -category: user guide ---- - -# JSON Functions and Generated Column - -## About - -To be compatible with MySQL 5.7 or later and better support the document store, TiDB supports JSON in the latest version. In TiDB, a document is a set of Key-Value pairs, encoded as a JSON object. You can use the JSON datatype in a TiDB table and create indexes for the JSON document fields using generated columns. In this way, you can flexibly deal with the business scenarios with uncertain schema and are no longer limited by the read performance and the lack of support for transactions in traditional document databases. - -## JSON functions - -The support for JSON in TiDB mainly refers to the user interface of MySQL 5.7. For example, you can create a table that includes a JSON field to store complex information: - -```sql -CREATE TABLE person ( - id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, - name VARCHAR(255) NOT NULL, - address_info JSON -); -``` - -When you insert data into a table, you can deal with those data with uncertain schema like this: - -```sql -INSERT INTO person (name, address_info) VALUES ("John", '{"city": "Beijing"}'); -``` - -You can insert JSON data into the table by inserting a legal JSON string into the column corresponding to the JSON field. TiDB will then parse the text and save it in a more compact and easy-to-access binary form. - -You can also convert other data type into JSON using CAST: - -```sql -INSERT INTO person (name, address_info) VALUES ("John", CAST('{"city": "Beijing"}' AS JSON)); -INSERT INTO person (name, address_info) VALUES ("John", CAST('123' AS JSON)); -INSERT INTO person (name, address_info) VALUES ("John", CAST(123 AS JSON)); -``` - -Now, if you want to query all the users living in Beijing from the table, you can simply use the following SQL statement: - -```sql -SELECT id, name FROM person WHERE JSON_EXTRACT(address_info, '$.city') = 'Beijing'; -``` - -TiDB supports the `JSON_EXTRACT` function which is exactly the same as in MySQL. The function is to extract the `city` field from the `address_info` document. The second argument is a "path expression" and is used to specify which field to extract. See the following few examples to help you understand the "path expression": - -```sql -SET @person = '{"name":"John","friends":[{"name":"Forest","age":16},{"name":"Zhang San","gender":"male"}]}'; - -SELECT JSON_EXTRACT(@person, '$.name'); -- gets "John" -SELECT JSON_EXTRACT(@person, '$.friends[0].age'); -- gets 16 -SELECT JSON_EXTRACT(@person, '$.friends[1].gender'); -- gets "male" -SELECT JSON_EXTRACT(@person, '$.friends[2].name'); -- gets NULL -``` - -In addition to inserting and querying data, TiDB also supports editing JSON. In general, TiDB currently supports the following JSON functions in MySQL 5.7: - -- [JSON_EXTRACT](https://dev.mysql.com/doc/refman/5.7/en/json-search-functions.html#function_json-extract) -- [JSON_ARRAY](https://dev.mysql.com/doc/refman/5.7/en/json-creation-functions.html#function_json-array) -- [JSON_OBJECT](https://dev.mysql.com/doc/refman/5.7/en/json-creation-functions.html#function_json-object) -- [JSON_SET](https://dev.mysql.com/doc/refman/5.7/en/json-modification-functions.html#function_json-set) -- [JSON_REPLACE](https://dev.mysql.com/doc/refman/5.7/en/json-modification-functions.html#function_json-replace) -- [JSON_INSERT](https://dev.mysql.com/doc/refman/5.7/en/json-modification-functions.html#function_json-insert) -- [JSON_REMOVE](https://dev.mysql.com/doc/refman/5.7/en/json-modification-functions.html#function_json-remove) -- [JSON_TYPE](https://dev.mysql.com/doc/refman/5.7/en/json-attribute-functions.html#function_json-type) -- [JSON_UNQUOTE](https://dev.mysql.com/doc/refman/5.7/en/json-modification-functions.html#function_json-unquote) - -You can get the general use of these functions directly from the function name. These functions in TiDB behave the same as in MySQL 5.7. For more information, see the [JSON Functions document of MySQL 5.7](https://dev.mysql.com/doc/refman/5.7/en/json-functions.html). If you are a user of MySQL 5.7, you can migrate to TiDB seamlessly. - -Currently TiDB does not support all the JSON functions in MySQL 5.7. This is because our preliminary goal is to provide complete support for **MySQL X Plugin**, which covers the majority of JSON functions used to insert, select, update and delete data. More functions will be supported if necessary. - -## Index JSON using generated column - -The full table scan is executed when you query a JSON field. When you run the `EXPLAIN` statement in TiDB, the results show that it is full table scan. Then, can you index the JSON field? - -First, this type of index is wrong: - -```sql -CREATE TABLE person ( - id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, - name VARCHAR(255) NOT NULL, - address_info JSON, - KEY (address_info) -); -``` - -This is not because of technical impossibility but because the direct comparison of JSON itself is meaningless. Although we can agree on some comparison rules, such as `ARRAY` is bigger than all `OBJECT`, it is useless. Therefore, as what is done in MySQL 5.7, TiDB prohibits the direct creation of index on JSON field, but you can index the fields in the JSON document in the form of generated column: - -```sql -CREATE TABLE person ( - id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, - name VARCHAR(255) NOT NULL, - address_info JSON, - city VARCHAR(64) AS (JSON_UNQUOTE(JSON_EXTRACT(address_info, '$.city'))) VIRTUAL, - KEY (city) -); -``` - -In this table, the `city` column is a **generated column**. As the name implies, the column is generated by other columns in the table, and cannot be assigned a value when inserted or updated. For generating a column, you can specify it as `VIRTUAL` to prevent it from being explicitly saved in the record, but by other columns when needed. This is particularly useful when the column is wide and you need to save storage space. With this generated column, you can create an index on it, and it looks the same with other regular columns. In query, you can run the following statements: - -```sql -SELECT name, id FROM person WHERE city = 'Beijing'; -``` - -In this way, you can create an index. - -> **Note**: In the JSON document, if the field in the specified path does not exist, the result of `JSON_EXTRACT` will be `NULL`. The value of the generated column with index is also `NULL`. If this is not what you want to see, you can add a `NOT NULL` constraint on the generated column. In this way, when the value of the `city` field is `NULL` after you insert data, it can be detected. - -## Limitations - -The current limitations of JSON and generated column are as follows: - -- You cannot add the generated column in the storage type of `STORED` through `ALTER TABLE`. -- You cannot create an index on the generated column through `ALTER TABLE`. - -The above functions and some other JSON functions are under development. diff --git a/sql/json-functions.md b/sql/json-functions.md index c0ef4fbd73a33..b9ae307a4bab6 100644 --- a/sql/json-functions.md +++ b/sql/json-functions.md @@ -1,26 +1,69 @@ --- title: JSON Functions +summary: Learn about JSON functions. category: user guide --- # JSON Functions -| Function Name and Syntactic Sugar | Description | -| ---------- | ------------------ | -| [JSON_EXTRACT(json_doc, path[, path] ...)][json_extract]| Return data from a JSON document, selected from the parts of the document matched by the `path` arguments | -| [JSON_UNQUOTE(json_val)][json_unquote] | Unquote JSON value and return the result as a `utf8mb4` string | -| [JSON_TYPE(json_val)][json_type] | Return a `utf8mb4` string indicating the type of a JSON value | -| [JSON_SET(json_doc, path, val[, path, val] ...)][json_set] | Insert or update data in a JSON document and return the result | -| [JSON_INSERT(json_doc, path, val[, path, val] ...)][json_insert] | Insert data into a JSON document and return the result | -| [JSON_REPLACE(json_doc, path, val[, path, val] ...)][json_replace] | Replace existing values in a JSON document and return the result | -| [JSON_REMOVE(json_doc, path[, path] ...)][json_remove] | Remove data from a JSON document and return the result | -| [JSON_MERGE(json_doc, json_doc[, json_doc] ...)][json_merge] | Merge two or more JSON documents and return the merged result | -| [JSON_OBJECT(key, val[, key, val] ...)][json_object] | Evaluate a (possibly empty) list of key-value pairs and return a JSON object containing those pairs | -| [JSON_ARRAY([val[, val] ...])][json_array] | Evaluate a (possibly empty) list of values and return a JSON array containing those values | -| -> | Return value from JSON column after evaluating path; the syntactic sugar of `JSON_EXTRACT(doc, path_literal)` | -| ->> | Return value from JSON column after evaluating path and unquoting the result; the syntactic sugar of `JSON_UNQUOTE(JSONJSON_EXTRACT(doc, path_literal))` | +TiDB supports most of the JSON functions that shipped with the GA release of MySQL 5.7. Additional JSON functions were added to MySQL 5.7 after its release, and not all are available in TiDB (see [unsupported functions](#unsupported-functions)). + +## Functions that create JSON values + +| Function Name and Syntactic Sugar | Description | +| --------------------------------- | ----------- | +| [JSON_ARRAY([val[, val] ...])][json_array] | Evaluates a (possibly empty) list of values and returns a JSON array containing those values | +| [JSON_OBJECT(key, val[, key, val] ...)][json_object] | Evaluates a (possibly empty) list of key-value pairs and returns a JSON object containing those pairs | + +## Functions that search JSON values + +| Function Name and Syntactic Sugar | Description | +| --------------------------------- | ----------- | +| [JSON_CONTAINS(target, candidate[, path])][json_contains] | Indicates by returning 1 or 0 whether a given candidate JSON document is contained within a target JSON document | +| [JSON_CONTAINS_PATH(json_doc, one_or_all, path[, path] ...)][json_contains_path] | Returns 0 or 1 to indicate whether a JSON document contains data at a given path or paths | +| [JSON_EXTRACT(json_doc, path[, path] ...)][json_extract]| Returns data from a JSON document, selected from the parts of the document matched by the `path` arguments | +| [->][json_short_extract] | Returns the value from a JSON column after the evaluating path; the syntactic sugar of `JSON_EXTRACT(doc, path_literal)` | +| [->>][json_short_extract_unquote] | Returns the value from a JSON column after the evaluating path and unquoting the result; the syntactic sugar of `JSON_UNQUOTE(JSON_EXTRACT(doc, path_literal))` | + +## Functions that modify JSON values + +| Function Name and Syntactic Sugar | Description | +| --------------------------------- | ----------- | +| [JSON_INSERT(json_doc, path, val[, path, val] ...)][json_insert] | Inserts data into a JSON document and returns the result | +| [JSON_MERGE(json_doc, json_doc[, json_doc] ...)][json_merge] | Merges two or more JSON documents and returns the merged result | +| [JSON_REMOVE(json_doc, path[, path] ...)][json_remove] | Removes data from a JSON document and returns the result | +| [JSON_REPLACE(json_doc, path, val[, path, val] ...)][json_replace] | Replaces existing values in a JSON document and returns the result | +| [JSON_SET(json_doc, path, val[, path, val] ...)][json_set] | Inserts or updates data in a JSON document and returns the result | +| [JSON_UNQUOTE(json_val)][json_unquote] | Unquotes a JSON value and returns the result as a string | + +## Functions that return JSON value attributes + +| Function Name and Syntactic Sugar | Description | +| --------------------------------- | ----------- | +| [JSON_LENGTH(json_doc[, path])][json_length] | Returns the length of a JSON document, or, if a path argument is given, the length of the value within the path | +| [JSON_TYPE(json_val)][json_type] | Returns a string indicating the type of a JSON value | + +## Unsupported functions + +The following JSON functions are unsupported in TiDB. You can track the progress in adding them in [TiDB #7546](https://github.com/pingcap/tidb/issues/7546): + +* `JSON_APPEND` and its alias `JSON_ARRAY_APPEND` +* `JSON_ARRAY_INSERT` +* `JSON_DEPTH` +* `JSON_KEYS` +* `JSON_MERGE_PATCH` +* `JSON_MERGE_PRESERVE`, use the alias `JSON_MERGE` instead +* `JSON_PRETTY` +* `JSON_QUOTE` +* `JSON_SEARCH` +* `JSON_STORAGE_SIZE` +* `JSON_VALID` +* `JSON_ARRAYAGG` +* `JSON_OBJECTAGG` [json_extract]: https://dev.mysql.com/doc/refman/5.7/en/json-search-functions.html#function_json-extract +[json_short_extract]: https://dev.mysql.com/doc/refman/5.7/en/json-search-functions.html#operator_json-column-path +[json_short_extract_unquote]: https://dev.mysql.com/doc/refman/5.7/en/json-search-functions.html#operator_json-inline-path [json_unquote]: https://dev.mysql.com/doc/refman/5.7/en/json-modification-functions.html#function_json-unquote [json_type]: https://dev.mysql.com/doc/refman/5.7/en/json-attribute-functions.html#function_json-type [json_set]: https://dev.mysql.com/doc/refman/5.7/en/json-modification-functions.html#function_json-set @@ -30,3 +73,10 @@ category: user guide [json_merge]: https://dev.mysql.com/doc/refman/5.7/en/json-modification-functions.html#function_json-merge [json_object]: https://dev.mysql.com/doc/refman/5.7/en/json-creation-functions.html#function_json-object [json_array]: https://dev.mysql.com/doc/refman/5.7/en/json-creation-functions.html#function_json-array +[json_keys]: https://dev.mysql.com/doc/refman/5.7/en/json-search-functions.html#function_json-keys +[json_length]: https://dev.mysql.com/doc/refman/5.7/en/json-attribute-functions.html#function_json-length +[json_valid]: https://dev.mysql.com/doc/refman/5.7/en/json-attribute-functions.html#function_json-valid +[json_quote]: https://dev.mysql.com/doc/refman/5.7/en/json-creation-functions.html#function_json-quote +[json_contains]: https://dev.mysql.com/doc/refman/5.7/en/json-search-functions.html#function_json-contains +[json_contains_path]: https://dev.mysql.com/doc/refman/5.7/en/json-search-functions.html#function_json-contains-path +[json_arrayagg]: https://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_json-arrayagg diff --git a/sql/keywords-and-reserved-words.md b/sql/keywords-and-reserved-words.md index e7f522b80a0d5..c24058ae5e1ba 100644 --- a/sql/keywords-and-reserved-words.md +++ b/sql/keywords-and-reserved-words.md @@ -1,5 +1,6 @@ --- title: Keywords and Reserved Words +summary: Learn about the keywords and reserved words in TiDB. category: user guide --- @@ -7,7 +8,7 @@ category: user guide Keywords are words that have significance in SQL. Certain keywords, such as `SELECT`, `UPDATE`, or `DELETE`, are reserved and require special treatment for use as identifiers such as table and column names. For example, as table names, the reserved words must be quoted with backquotes: -``` +```sql mysql> CREATE TABLE select (a INT); ERROR 1105 (HY000): line 0 column 19 near " (a INT)" (total length 27) mysql> CREATE TABLE `select` (a INT); @@ -16,14 +17,14 @@ Query OK, 0 rows affected (0.09 sec) The `BEGIN` and `END` are keywords but not reserved words, so you do not need to quote them with backquotes: -``` +```sql mysql> CREATE TABLE `select` (BEGIN int, END int); Query OK, 0 rows affected (0.09 sec) ``` Exception: A word that follows a period `.` qualifier does not need to be quoted with backquotes either: -``` +```sql mysql> CREATE TABLE test.select (BEGIN int, END int); Query OK, 0 rows affected (0.08 sec) ``` diff --git a/sql/literal-values.md b/sql/literal-values.md index 1914dbfff3a3f..248b874a78591 100644 --- a/sql/literal-values.md +++ b/sql/literal-values.md @@ -1,10 +1,13 @@ --- title: Literal Values +summary: Learn how to use various literal values. category: user guide --- # Literal Values +This document describes String literals, Numeric literals, NULL values, Hexadecimal literals, Date and time literals, Boolean literals, and Bit-value literals. + ## String literals A string is a sequence of bytes or characters, enclosed within either single quote `'` or double quote `"` characters. For example: @@ -115,7 +118,7 @@ X'1z' (z is not a hexadecimal legal digit) Hexadecimal literals written using `X'val'` notation must contain an even number of digits. To avoid the syntax error, pad the value with a leading zero: -``` +```sql mysql> select X'aff'; ERROR 1105 (HY000): line 0 column 13 near ""hex literal: invalid hexadecimal format, must even numbers, but 3 (total length 13) mysql> select X'0aff'; @@ -132,7 +135,7 @@ By default, a hexadecimal literal is a binary string. To convert a string or a number to a string in hexadecimal format, use the `HEX()` function: -``` +```sql mysql> SELECT HEX('TiDB'); +-------------+ | HEX('TiDB') | @@ -190,7 +193,7 @@ For more information, see [Date and Time Literals in MySQL](https://dev.mysql.co The constants `TRUE` and `FALSE` evaluate to 1 and 0 respectively, which are not case sensitive. -``` +```sql mysql> SELECT TRUE, true, tRuE, FALSE, FaLsE, false; +------+------+------+-------+-------+-------+ | TRUE | true | tRuE | FALSE | FaLsE | false | diff --git a/sql/miscellaneous-functions.md b/sql/miscellaneous-functions.md index 99b82ba2d21e3..db3150f02f639 100644 --- a/sql/miscellaneous-functions.md +++ b/sql/miscellaneous-functions.md @@ -1,5 +1,6 @@ --- title: Miscellaneous Functions +summary: Learn about miscellaneous functions in TiDB. category: user guide --- diff --git a/sql/mysql-compatibility.md b/sql/mysql-compatibility.md index 625c264c30ec4..f41ff8455b2e9 100644 --- a/sql/mysql-compatibility.md +++ b/sql/mysql-compatibility.md @@ -1,11 +1,12 @@ --- title: Compatibility with MySQL +summary: Learn about the compatibility of TiDB with MySQL, and the unsupported and different features. category: user guide --- # Compatibility with MySQL -TiDB supports the majority of the MySQL grammar, including cross-row transactions, JOIN, subquery, and so on. You can connect to TiDB directly using your own MySQL client. If your existing business is developed based on MySQL, you can replace MySQL with TiDB to power your application without changing a single line of code in most cases. +TiDB supports the majority of the MySQL 5.7 syntax, including cross-row transactions, JOIN, subquery, and so on. You can connect to TiDB directly using your own MySQL client. If your existing business is developed based on MySQL, you can replace MySQL with TiDB to power your application without changing a single line of code in most cases. TiDB is compatible with most of the MySQL database management & administration tools such as `PHPMyAdmin`, `Navicat`, `MySQL Workbench`, and so on. It also supports the database backup tools, such as `mysqldump` and `mydumper/myloader`. @@ -13,16 +14,21 @@ However, in TiDB, the following MySQL features are not supported for the time be ## Unsupported features -+ Stored Procedures -+ View -+ Trigger -+ The user-defined functions -+ The `FOREIGN KEY` constraints -+ The `FULLTEXT` indexes -+ The `Spatial` indexes -+ The Non-UTF-8 characters ++ Stored procedures and functions ++ Views ++ Triggers ++ Events ++ User-defined functions ++ `FOREIGN KEY` constraints ++ `FULLTEXT` indexes ++ `SPATIAL` indexes ++ Character sets other than `utf8` + Add primary key + Drop primary key ++ SYS schema ++ Optimizer trace ++ XML Functions ++ X-Protocol ## Features that are different from MySQL @@ -32,18 +38,24 @@ The auto-increment ID feature in TiDB is only guaranteed to be automatically inc > **Warning**: > -> If you use the auto-increment ID in a cluster with multiple TiDB servers, do not mix the default value and the custom value, because it reports an error in the following situation: +> If you use the auto-increment ID in a cluster with multiple tidb-server instances, do not mix the default value and the custom value, otherwise an error occurs in the following situation: > -> In a cluster of two TiDB servers, namely TiDB A and TiDB B, TiDB A caches [1,5000] auto-increment ID, while TiDB B caches [5001,10000] auto-increment ID. Use the following statement to create a table with auto-increment ID: +> Assume that you have a table with the auto-increment ID: > -> ``` -> create table t(id int unique key auto_increment, c int); -> ``` +> `create table t(id int unique key auto_increment, c int);` +> +> The principle of the auto-increment ID in TiDB is that each tidb-server instance caches a section of ID values (currently 30000 IDs are cached) for allocation and fetches the next section after this section is used up. > -> The statement is executed as follows: +> Assume that the cluster contains two tidb-server instances, namely Instance A and Instance B. Instance A caches the auto-increment ID of [1, 30000], while Instance B caches the auto-increment ID of [30001, 60000]. +> +> The operations are executed as follows: > -> 1. The client inserts a statement to TiDB B which sets the `id` to be 1 and the statement is executed successfully. -> 2. The client inserts a record to TiDB A which sets the `id` set to the default value 1. In this case, it returns `Duplicated Error`. +> 1. The client issues the `insert into t values (1, 1)` statement to Instance B which sets the `id` to 1 and the statement is executed successfully. +> 2. The client issues the `insert into t (c) (1)` statement to Instance A. This statement does not specify the value of `id`, so Instance A allocates the value. Currently, Instances A caches the auto-increment ID of [1, 30000], so it allocates the `id` value to 1 and adds 1 to the local counter. However, at this time the data with the `id` of 1 already exists in the cluster, therefore it reports `Duplicated Error`. + +### Performance schema + +Performance schema tables return empty results in TiDB. TiDB uses a combination of [Prometheus and Grafana](https://pingcap.com/docs/op-guide/monitor/#use-prometheus-and-grafana) for performance metrics instead. ### Built-in functions @@ -78,12 +90,20 @@ TiDB implements the asynchronous schema changes algorithm in F1. The Data Manipu + Rename Table + Create Table Like -### Transaction +### Transaction model TiDB implements an optimistic transaction model. Unlike MySQL, which uses row-level locking to avoid write conflict, in TiDB, the write conflict is checked only in the `commit` process during the execution of the statements like `Update`, `Insert`, `Delete`, and so on. **Note:** On the business side, remember to check the returned results of `commit` because even there is no error in the execution, there might be errors in the `commit` process. +### Large transactions + +Due to the distributed, 2-phase commit requirement of TiDB, large transactions that modify data can be particularly problematic. TiDB intentionally sets some limits on transaction sizes to reduce this impact: + +* Each Key-Value entry is no more than 6MB +* The total number of Key-Value entries is no more than 300,000 +* The total size of Key-Value entries is no more than 100MB + ### Load data + Syntax: @@ -92,6 +112,7 @@ TiDB implements an optimistic transaction model. Unlike MySQL, which uses row-le LOAD DATA LOCAL INFILE 'file_name' INTO TABLE table_name {FIELDS | COLUMNS} TERMINATED BY 'string' ENCLOSED BY 'char' ESCAPED BY 'char' LINES STARTING BY 'string' TERMINATED BY 'string' + IGNORE n LINES (col_name ...); ``` @@ -99,4 +120,43 @@ TiDB implements an optimistic transaction model. Unlike MySQL, which uses row-le + Transaction - When TiDB is in the execution of loading data, by default, a record with 20,000 rows of data is seen as a transaction for persistent storage. If a load data operation inserts more than 20,000 rows, it will be divided into multiple transactions to commit. If an error occurs in one transaction, this transaction in process will not be committed. However, transactions before that are committed successfully. In this case, a part of the load data operation is successfully inserted, and the rest of the data insertion fails. But MySQL treats a load data operation as a transaction, one error leads to the failure of the entire load data operation. \ No newline at end of file + When TiDB is in the execution of loading data, by default, a record with 20,000 rows of data is seen as a transaction for persistent storage. If a load data operation inserts more than 20,000 rows, it will be divided into multiple transactions to commit. If an error occurs in one transaction, this transaction in process will not be committed. However, transactions before that are committed successfully. In this case, a part of the load data operation is successfully inserted, and the rest of the data insertion fails. But MySQL treats a load data operation as a transaction, one error leads to the failure of the entire load data operation. + +### Storage engines + +For compatibility reasons, TiDB supports the syntax to create tables with alternative storage engines. Metadata commands describe tables as being of engine InnoDB: + +```sql +mysql> CREATE TABLE t1 (a INT) ENGINE=MyISAM; +Query OK, 0 rows affected (0.14 sec) + +mysql> SHOW CREATE TABLE t1\G +*************************** 1. row *************************** + Table: t1 +Create Table: CREATE TABLE `t1` ( + `a` int(11) DEFAULT NULL +) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin +1 row in set (0.00 sec) +``` + +Architecturally, TiDB does support a similar storage engine abstraction to MySQL, and user tables are created in the engine specified by the [`--store`](server-command-option.md#--store) option used when you start tidb-server (typically `tikv`). + +### EXPLAIN + +The output of the query execution plan returned from the `EXPLAIN` command differs from MySQL. For more information, see [Understand the Query Execution Plan](../sql/understanding-the-query-execution-plan.md). + +### Default differences + +- Default character set: + - The default value in TiDB is `utf8` which is equivalent to `utf8mb4` in MySQL. + - The default value in MySQL 5.7 is `latin1`, but changes to `utf8mb4` in MySQL 8.0. +- Default collation: `latin1_swedish_ci` in MySQL 5.7, while `binary` in TiDB. +- Default SQL mode: + - The default value in TiDB is `STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION`. + - The default value in MySQL 5.7 is `ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION`. +- Default value of `lower_case_table_names`: + - The default value in TiDB is 2 and currently TiDB only supports 2. + - The default value in MySQL: + - On Linux: 0 + - On Windows: 1 + - On macOS: 2 diff --git a/sql/numeric-functions-and-operators.md b/sql/numeric-functions-and-operators.md index 5520112695e23..0ba58269145d7 100644 --- a/sql/numeric-functions-and-operators.md +++ b/sql/numeric-functions-and-operators.md @@ -1,10 +1,13 @@ --- title: Numeric Functions and Operators +summary: Learn about the numeric functions and operators. category: user guide --- # Numeric Functions and Operators +This document describes the arithmetic operators and mathematical functions. + ## Arithmetic operators | Name | Description | diff --git a/sql/operators.md b/sql/operators.md index 1c1f8d5ee7948..fd48f5373b88a 100644 --- a/sql/operators.md +++ b/sql/operators.md @@ -1,5 +1,13 @@ +--- +title: Operators +summary: Learn about the operators precedence, comparison functions and operators, logical operators, and assignment operators. +category: user guide +--- + # Operators +This document describes the operators precedence, comparison functions and operators, logical operators, and assignment operators. + - [Operator precedence](#operator-precedence) - [Comparison functions and operators](#comparison-functions-and-operators) - [Logical operators](#logical-operators) @@ -54,7 +62,7 @@ Operator precedences are shown in the following list, from highest precedence to the lowest. Operators that are shown together on a line have the same precedence. -``` sql +```sql INTERVAL BINARY, COLLATE ! diff --git a/sql/precision-math.md b/sql/precision-math.md index 72d3364171e3f..70a61815822d8 100644 --- a/sql/precision-math.md +++ b/sql/precision-math.md @@ -1,5 +1,6 @@ --- title: Precision Math +summary: Learn about the precision math in TiDB. category: user guide --- @@ -71,6 +72,7 @@ SET sql_mode = 'TRADITIONAL`; ``` If a number is inserted into an exact type column (DECIMAL or integer), it is inserted with its exact value if it is within the column range. For this number: + - If the value has too many digits in the fractional part, rounding occurs and a warning is generated. - If the value has too many digits in the integer part, it is too large and is handled as follows: - If strict mode is not enabled, the value is truncated to the nearest legal value and a warning is generated. @@ -91,6 +93,7 @@ In the following SQL statement: ```sql INSERT INTO t SET i = 1/0; ``` + The following results are returned in different SQL modes: | `sql_mode` Value | Result | @@ -100,7 +103,6 @@ The following results are returned in different SQL modes: | `ERROR_FOR_DIVISION_BY_ZERO` | Warning, no error; i is set to NULL. | | strict, `ERROR_FOR_DIVISION_BY_ZERO` | Error; no row is inserted. | - ## Rounding behavior The result of the `ROUND()` function depends on whether its argument is exact or approximate: diff --git a/sql/prepare.md b/sql/prepare.md index 00dfc0d31736e..83184585c78cd 100644 --- a/sql/prepare.md +++ b/sql/prepare.md @@ -1,5 +1,6 @@ --- title: Prepared SQL Statement Syntax +summary: Use Prepared statements to reduce the load of statement parsing and query optimization, and improve execution efficiency. category: user guide --- diff --git a/sql/privilege.md b/sql/privilege.md index ca24ca038f9ef..bde9d7ea4977d 100644 --- a/sql/privilege.md +++ b/sql/privilege.md @@ -1,12 +1,11 @@ --- title: Privilege Management +summary: Learn how to manage the privilege. category: user guide --- # Privilege Management -## Privilege management overview - TiDB's privilege management system is implemented according to the privilege management system in MySQL. It supports most of the syntaxes and privilege types in MySQL. If you find any inconsistency with MySQL, feel free to [open an issue](https://github.com/pingcap/docs-cn/issues/new). ## Examples @@ -57,6 +56,7 @@ The `DROP USER` statement removes one or more MySQL accounts and their privilege ```sql drop user 'test'@'%'; ``` + **Required Privilege:** To use `DROP USER`, you must have the global `CREATE USER` privilege. #### Reset the root password @@ -97,7 +97,7 @@ grant all privileges on *.* to 'xxx'@'%'; If the granted user does not exist, TiDB will automatically create a user. -``` +```sql mysql> select * from mysql.user where user='xxxx'; Empty set (0.00 sec) @@ -117,7 +117,7 @@ In this example, `xxxx@%` is the user that is automatically created. > **Note:** Granting privileges to a database or table does not check if the database or table exists. -``` +```sql mysql> select * from test.xxxx; ERROR 1146 (42S02): Table 'test.xxxx' doesn't exist @@ -135,7 +135,7 @@ mysql> select user,host from mysql.tables_priv where user='xxxx'; You can use fuzzy matching to grant privileges to databases and tables. -``` +```sql mysql> grant all privileges on `te%`.* to genius; Query OK, 0 rows affected (0.00 sec) @@ -162,7 +162,7 @@ revoke all privileges on `test`.* from 'genius'@'localhost'; > **Note:** To revoke privileges, you need the exact match. If the matching result cannot be found, an error will be displayed: - ``` + ```sql mysql> revoke all privileges on `te%`.* from 'genius'@'%'; ERROR 1141 (42000): There is no such grant defined for user 'genius' on host '%' ``` @@ -208,19 +208,19 @@ To be more precise, you can check the privilege information in the `Grant` table 1. Check if `test@%` has global `Insert` privilege: ```sql - select Insert from mysql.user where user='test' and host='%'; + select Insert_priv from mysql.user where user='test' and host='%'; ``` 2. If not, check if `test@%` has database-level `Insert` privilege at `db1`: ```sql - select Insert from mysql.db where user='test' and host='%'; + select Insert_priv from mysql.db where user='test' and host='%'; ``` 3. If the result is still empty, check whether `test@%` has table-level `Insert` privilege at `db1.t`: ```sql - select tables_priv from mysql.tables_priv where user='test' and host='%' and db='db1'; + select table_priv from mysql.tables_priv where user='test' and host='%' and db='db1'; ``` ### Implementation of the privilege system @@ -254,7 +254,7 @@ In theory, all privilege-related operations can be done directly by the CRUD ope On the implementation level, only a layer of syntactic sugar is added. For example, you can use the following command to remove a user: -``` +```sql delete from mysql.user where user='test'; ``` @@ -317,7 +317,7 @@ auth_spec: { } ``` -For more information about the user account, see [TiDB user account management](user-account-management.md). +For more information about the user account, see [TiDB user account management](../sql/user-account-management.md). - IDENTIFIED BY `auth_string` diff --git a/sql/schema-object-names.md b/sql/schema-object-names.md index 4113e4825efc3..dee1400cbea94 100644 --- a/sql/schema-object-names.md +++ b/sql/schema-object-names.md @@ -1,5 +1,6 @@ --- title: Schema Object Names +summary: Learn about the schema object names (identifiers) in TiDB. category: user guide --- @@ -54,7 +55,7 @@ Object names can be unqualified or qualified. For example, the following stateme CREATE TABLE t (i int); ``` -If there is no default database, the `ERROR 1046 (3D000): No database selected` is displayed. You can also use the qualified name ` test.t`: +If there is no default database, the `ERROR 1046 (3D000): No database selected` is displayed. You can also use the qualified name `test.t`: ```sql CREATE TABLE test.t (i int); @@ -73,5 +74,5 @@ Instead of ```sql `table_name.col_name` ``` -For more information, see [MySQL Identifier Qualifiers](https://dev.mysql.com/doc/refman/5.7/en/identifier-qualifiers.html). +For more information, see [MySQL Identifier Qualifiers](https://dev.mysql.com/doc/refman/5.7/en/identifier-qualifiers.html). \ No newline at end of file diff --git a/sql/server-command-option.md b/sql/server-command-option.md index c187530b92267..851da50497797 100644 --- a/sql/server-command-option.md +++ b/sql/server-command-option.md @@ -1,10 +1,13 @@ --- title: The TiDB Command Options +summary: Learn about TiDB command options and configuration files. category: user guide --- # The TiDB Command Options +This document describes the startup options and TiDB server configuration files. + ## TiDB startup options When you start TiDB processes, you can specify some program options. @@ -154,6 +157,14 @@ Same as the "run-ddl" startup option - Default: true - When you execute `join` on tables without any conditions on both sides, the statement can be run by default. But if you set the value to `false`, the server does not run such `join` statement. +### force-priority + +- The default priority for statements +- Default: `NO_PRIORITY` +- TiDB supports the priorities `NO_PRIORITY` | `LOW_PRIORITY` | `DELAYED` | `HIGH_PRIORITY` for statements. One use case for changing the priority, is you may choose to dedicate a pool of servers for OLAP queries and set the value to `LOW_PRIORITY` to ensure that TiKV servers will provide priority to OLTP workloads which are routed to a different pool of TiDB servers. This helps ensure more uniform OLTP performance at the risk of slightly slower OLAP performance. + +TiDB will automatically set table scans to `LOW_PRIORITY` and overwriting priority on a per-statement basis is possible by using the `HIGH PRIORITY` or `LOW PRIORITY` [DML modifier](dml.md). + ### join-concurrency - The goroutine number when the `join-concurrency` runs `join` diff --git a/sql/slow-query.md b/sql/slow-query.md new file mode 100644 index 0000000000000..e919aadc01f26 --- /dev/null +++ b/sql/slow-query.md @@ -0,0 +1,125 @@ +--- +title: Slow Query Log +summary: Use the slow query log to identify problematic SQL statements. +category: user guide +--- + +# Slow Query Log + +The slow query log is a record of SQL statements that took a long time to perform. + +A problematic SQL statement can increase the pressure on the entire cluster, resulting in a longer response time. To solve this problem, you can use the slow query log to identify the problematic statements and thus improve the performance. + +### Obtain the log + +By `grep` the keyword `SLOW_QUERY` in the log file of TiDB, you can obtain the logs of statements whose execution time exceeds [slow-threshold](../op-guide/tidb-config-file.md#slow-threshold). + +You can edit `slow-threshold` in the configuration file and its default value is 300ms. If you configure the [slow-query-file](../op-guide/tidb-config-file.md#slow-query-file), all the slow query logs will be written in this file. + +### Usage example + +``` +2018/08/20 19:52:08.632 adapter.go:363: [warning] [SLOW_QUERY] cost_time:18.647928814s +process_time:1m6.768s wait_time:12m11.212s backoff_time:600ms request_count:2058 +total_keys:1869712 processed_keys:1869710 succ:true con:3 user:root@127.0.0.1 +txn_start_ts:402329674704224261 database:test table_ids:[31],index_ids:[1], +sql:select count(c) from sbtest1 use index (k_1) +``` + +### Fields description + +This section describes fields in the slow query log based on the usage example above. + +#### `cost_time` + +The execution time of this statement. Only the statements whose execution time exceeds [slow-threshold](../op-guide/tidb-config-file.md#slow-threshold) output this log. + +#### `process_time` + +The total processing time of this statement in TiKV. Because data is sent to TiKV concurrently for execution, this value might exceed `cost_time`. + +#### `wait_time` + +The total waiting time of this statement in TiKV. Because the Coprocessor of TiKV runs a limited number of threads, requests might queue up when all threads of Coprocessor are working. When a request in the queue takes a long time to process, the waiting time of the subsequent requests will increase. + +#### `backoff_time` + +The waiting time before retry when this statement encounters errors that require a retry. The common errors as such include: lock occurs, Region split, the TiKV server is busy. + +#### `request_count` + +The number of Coprocessor requests that this statement sends. + +#### `total_keys` + +The number of keys that Coprocessor has scanned. + +#### `processed_keys` + +The number of keys that Coprocessor has processed. Compared with `total_keys`, `processed_keys` does not include the old versions of MVCC. A great difference between `processed_keys` and `total_keys` indicates that the number of old versions are relatively large. + +#### `succ` + +Whether the execution of the request succeeds or not. + +#### `con` + +Connection ID (session ID). For example, you can use the keyword `con:3` to `grep` the log whose session ID is 3. + +#### `user` + +The name of the user who executes this statement. + +#### `txn_start_ts` + +The start timestamp of the transaction, that is, the ID of the transaction. You can use this value to `grep` the transaction-related logs. + +#### `database` + +The current database. + +#### `table_ids` + +The IDs of the tables involved in the statement. + +#### `index_ids` + +The IDs of the indexes involved in the statement. + +#### `sql` + +The SQL statement. + +### Identify problematic SQL statements + +Not all of the `SLOW_QUERY` statements are problematic. Only those whose `process_time` is very large will increase the pressure on the entire cluster. + +The statements whose `wait_time` is very large and `process_time` is very small are usually not problematic. The large `wait_time` is because the statement is blocked by real problematic statements and it has to wait in the execution queue, which leads to a much longer response time. + +### `admin show slow` command + +In addition to the TiDB log file, you can identify slow queries by running the `admin show slow` command: + +```sql +admin show slow recent N +admin show slow top [internal | all] N +``` + +`recent N` shows the recent N slow query records, for example: + +```sql +admin show recent 10 +``` + +`top N` shows the slowest N query records recently (within a few days). +If the `internal` option is provided, the returned results would be the inner SQL executed by the system; +If the `all` option is provided, the returned results would be the user's SQL combinated with inner SQL; +Otherwise, this command would only return the slow query records from the user's SQL. + +```sql +admin show top 3 +admin show top internal 3 +admin show top all 5 +``` + +Due to the memory footprint restriction, the stored slow query records count is limited. If the specified `N` is greater than the records count, the returned records count may be smaller than `N`. diff --git a/sql/sql-optimizer-overview.md b/sql/sql-optimizer-overview.md new file mode 100644 index 0000000000000..6e69213bee8ab --- /dev/null +++ b/sql/sql-optimizer-overview.md @@ -0,0 +1,33 @@ +--- +title: SQL Optimization Process +summary: Learn about the logical and physical optimization of SQL in TiDB. +category: user guide +--- + +# SQL Optimization Process + +In TiDB, the process of SQL optimization consists of two phases: logical optimization and physical optimization. This document describes the logical and physical optimization to help you understand the whole process. + +## Logical optimization + +Based on rules, logical optimization applies some optimization rules to the input logical execution plan in order, to make the whole logical execution plan better. The optimization rules include: + +- Column pruning +- Eliminate projection +- Decorrelate correlated subqueries +- Eliminate Max/Min +- Push down predicates +- Partition pruning +- Push down TopN and Limit + +## Physical optimization + +Based on cost, physical optimization makes the physical execution plan for the logical execution plan generated in the previous phase. + +In this phase, the optimizer selects the specific physical implementation for each operator in the logical execution plan. Different physical implementations of logical operators differ in time complexity, resource consumption, physical properties, and so on. During this process, the optimizer determines the cost of different physical implementations according to data statistics, and selects the physical execution plan with the minimum whole cost. + +The logical execution plan is a tree structure and each node corresponds to a logical operator in SQL. Similarly, the physical execution plan is also a tree structure, and each node corresponds to a physical operator in SQL. + +The logical operator only describes the function of an operator, while the physical operator describes the concrete algorithm that implements this function. A single logical operator might have multiple physical operator implementations. For example, to implement `LogicalAggregate`, you can use either `HashAggregate` the of the hash algorithm, or `StreamAggregate` of the stream type. + +Different physical operators have different physical properties, and have different requirements on the physical properties of their subnodes. The physical properties include the data's order, distribution, and so on. Currently, only the data order is considered in TiDB. \ No newline at end of file diff --git a/sql/statistics.md b/sql/statistics.md index 826c9b18ad5de..354cc8b9eea9c 100644 --- a/sql/statistics.md +++ b/sql/statistics.md @@ -1,5 +1,6 @@ --- title: Introduction to Statistics +summary: Learn how the statistics collect table-level and column-level information. category: user guide --- @@ -31,6 +32,8 @@ ANALYZE TABLE TableName INDEX [IndexNameList] For the `INSERT`, `DELETE`, or `UPDATE` statements, TiDB automatically updates the number of rows and updated rows. TiDB persists this information regularly and the update cycle is 5 * `stats-lease`. The default value of `stats-lease` is `3s`. If you specify the value as `0`, it does not update automatically. +When the ratio of the number of modified rows to the total number of rows is greater than `auto-analyze-ratio`, TiDB automatically starts the `Analyze` statement. You can modify the value of `auto-analyze-ratio` in the configuration file. The default value is `0`, which means that this function is not enabled. + When the query is executed, TiDB collects feedback with the probability of `feedback-probability` and uses it to update the histogram and Count-Min Sketch. You can modify the value of `feedback-probability` in the configuration file. The default value is `0`. ### Control `ANALYZE` concurrency diff --git a/sql/string-functions.md b/sql/string-functions.md index a50d4f1df532b..ef845a0cb6e56 100644 --- a/sql/string-functions.md +++ b/sql/string-functions.md @@ -1,9 +1,9 @@ --- title: String Functions +summary: Learn about the string functions in TiDB. category: user guide --- - # String Functions | Name | Description | @@ -54,8 +54,6 @@ category: user guide | [`FORMAT()`](https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_format) | Return a number formatted to specified number of decimal places | | [`ORD()`](https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_ord) | Return character code for leftmost character of the argument | | [`QUOTE()`](https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_quote) | Escape the argument for use in an SQL statement | -| [`SOUNDEX()`](https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_soundex) | Return a soundex string | -| [`SOUNDS LIKE`](https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#operator_sounds-like) | Compare sounds | ## String comparison functions @@ -64,7 +62,6 @@ category: user guide | [`LIKE`](https://dev.mysql.com/doc/refman/5.7/en/string-comparison-functions.html#operator_like) | Simple pattern matching | | [`NOT LIKE`](https://dev.mysql.com/doc/refman/5.7/en/string-comparison-functions.html#operator_not-like) | Negation of simple pattern matching | | [`STRCMP()`](https://dev.mysql.com/doc/refman/5.7/en/string-comparison-functions.html#function_strcmp) | Compare two strings | -| [`MATCH`](https://dev.mysql.com/doc/refman/5.7/en/fulltext-search.html#function_match) | Perform full-text search | ## Regular expressions diff --git a/sql/system-database.md b/sql/system-database.md index 3b008cf78f205..5c3f7c44b32e8 100644 --- a/sql/system-database.md +++ b/sql/system-database.md @@ -1,5 +1,6 @@ --- title: The TiDB System Database +summary: Learn tables contained in the TiDB System Database. category: user guide --- @@ -41,29 +42,29 @@ To be compatible with MySQL, TiDB supports INFORMATION\_SCHEMA tables. Some thir ### CHARACTER\_SETS table -The CHARACTER\_SETS table provides information about character sets. But it contains dummy data. By default, TiDB only supports utf8mb4. +The CHARACTER\_SETS table provides information about [character sets](../sql/character-set-support.md). The default character set in TiDB is `utf8`, which behaves similar to `utf8mb4` in MySQL. Additional character sets in this table are included for compatibility with MySQL: ```sql -mysql> select * from CHARACTER_SETS; -+--------------------|----------------------|-----------------------|--------+ -| CHARACTER_SET_NAME | DEFAULT_COLLATE_NAME | DESCRIPTION | MAXLEN | -+--------------------|----------------------|-----------------------|--------+ -| ascii | ascii_general_ci | US ASCII | 1 | -| binary | binary | Binary pseudo charset | 1 | -| latin1 | latin1_swedish_ci | cp1252 West European | 1 | -| utf8 | utf8_general_ci | UTF-8 Unicode | 3 | -| utf8mb4 | utf8mb4_general_ci | UTF-8 Unicode | 4 | -+--------------------|----------------------|-----------------------|--------+ +mysql> SELECT * FROM character_sets; ++--------------------+----------------------+---------------+--------+ +| CHARACTER_SET_NAME | DEFAULT_COLLATE_NAME | DESCRIPTION | MAXLEN | ++--------------------+----------------------+---------------+--------+ +| utf8 | utf8_bin | UTF-8 Unicode | 3 | +| utf8mb4 | utf8mb4_bin | UTF-8 Unicode | 4 | +| ascii | ascii_bin | US ASCII | 1 | +| latin1 | latin1_bin | Latin1 | 1 | +| binary | binary | binary | 1 | ++--------------------+----------------------+---------------+--------+ 5 rows in set (0.00 sec) ``` ### COLLATIONS table -The COLLATIONS table is similar to the CHARACTER\_SETS table. +The COLLATIONS table provides a list of collations that correspond to character sets in the CHARACTER\_SETS table. Currently this table is included only for compatibility with MySQL, as TiDB only supports binary collation. ### COLLATION\_CHARACTER\_SET\_APPLICABILITY table -NULL. +This table maps collations to the applicable character set name. Similar to the collations table, it is included only for compatibility with MySQL. ### COLUMNS table @@ -73,7 +74,7 @@ The COLUMNS table provides information about columns in tables. The information SHOW COLUMNS FROM table_name [FROM db_name] [LIKE 'wild'] ``` -### COLUMNS\_PRIVILEGE table +### COLUMN\_PRIVILEGES table NULL. diff --git a/sql/tidb-memory-control.md b/sql/tidb-memory-control.md index 579d9b0176404..1a88bc5097cf1 100644 --- a/sql/tidb-memory-control.md +++ b/sql/tidb-memory-control.md @@ -1,11 +1,12 @@ --- title: TiDB Memory Control +summary: Learn how to configure the memory quota of a query and avoid OOM (out of memory). category: user guide --- # TiDB Memory Control -Currently TiDB can track the memory quota of a single SQL query and take actions to prevent OOM (out of memory) or troubleshoot OOM when the memory usage exceeds a specific threshold value. In the TiDB configuration file, you can configure the options as below to control TiDB behaviors when the memory quota exceeds the threshold value: +Currently, TiDB can track the memory quota of a single SQL query and take actions to prevent OOM (out of memory) or troubleshoot OOM when the memory usage exceeds a specific threshold value. In the TiDB configuration file, you can configure the options as below to control TiDB behaviors when the memory quota exceeds the threshold value: ``` # Valid options: ["log", "cancel"] diff --git a/sql/tidb-server.md b/sql/tidb-server.md index 3725c45fd7e4f..c83afc9ade561 100644 --- a/sql/tidb-server.md +++ b/sql/tidb-server.md @@ -1,25 +1,24 @@ --- title: The TiDB Server +summary: Learn about the basic management functions of the TiDB cluster. category: user guide --- # The TiDB Server -## TiDB service - TiDB refers to the TiDB database management system. This document describes the basic management functions of the TiDB cluster. ## TiDB cluster startup configuration -You can set the service parameters using the command line or the configuration file, or both. The priority of the command line parameters is higher than the configuration file. If the same parameter is set in both ways, TiDB uses the value set using command line parameters. For more information, see [The TiDB Command Options](server-command-option.md). +You can set the service parameters using the command line or the configuration file, or both. The priority of the command line parameters is higher than the configuration file. If the same parameter is set in both ways, TiDB uses the value set using command line parameters. For more information, see [The TiDB Command Options](../sql/server-command-option.md). ## TiDB system variable -TiDB is compatible with MySQL system variables, and defines some unique system variables to adjust the database behavior. For more information, see [The Proprietary System Variables and Syntaxes in TiDB](tidb-specific.md). +TiDB is compatible with MySQL system variables, and defines some unique system variables to adjust the database behavior. For more information, see [The Proprietary System Variables and Syntaxes in TiDB](../sql/tidb-specific.md). ## TiDB system table -Similar to MySQL, TiDB also has system tables that store the information needed when TiDB runs. For more information, see [The TiDB System Database](system-database.md). +Similar to MySQL, TiDB also has system tables that store the information needed when TiDB runs. For more information, see [The TiDB System Database](../sql/system-database.md). ## TiDB data directory @@ -31,6 +30,6 @@ When you use the TiKV storage engine, the data is stored on the TiKV node and th ## TiDB server logs -The three components of the TiDB cluster (`tidb-server`, ` tikv-server` and `pd-server`) outputs the logs to standard errors by default. In each of the three components, you can set the [`--log-file`](op-guide/configuration.md#--log-file) parameter (or the configuration item in the configuration file) and output the log into a file. +The three components of the TiDB cluster (`tidb-server`, ` tikv-server` and `pd-server`) outputs the logs to standard errors by default. In each of the three components, you can set the [`--log-file`](../op-guide/configuration.md#--log-file) parameter (or the configuration item in the configuration file) and output the log into a file. You can adjust the log behavior using the configuration file. For more details, see the configuration file description of each component. For example, the [`tidb-server` log configuration item](https://github.com/pingcap/tidb/blob/master/config/config.toml.example#L46). diff --git a/sql/tidb-specific.md b/sql/tidb-specific.md index 33ea68b32e3e0..3a0c426c1cf82 100644 --- a/sql/tidb-specific.md +++ b/sql/tidb-specific.md @@ -1,30 +1,36 @@ --- -title: The Proprietary System Variables and Syntaxes in TiDB +title: TiDB Specific System Variables +summary: Use system variables specific to TiDB to optimize performance. category: user guide --- -# The Proprietary System Variables and Syntaxes in TiDB +# TiDB Specific System Variables -On the basis of MySQL variables and syntaxes, TiDB has defined some specific system variables and syntaxes to optimize performance. +TiDB contains a number of system variables which are specific to its usage, and **do not** apply to MySQL. These variables start with a `tidb_` prefix, and can be tuned to optimize system performance. ## System variable + Variables can be set with the `SET` statement, for example: -```set @@tidb_distsql_scan_concurrency = 10 ``` +``` +set @@tidb_distsql_scan_concurrency = 10 +``` If you need to set the global variable, run: -```set @@global.tidb_distsql_scan_concurrency = 10 ``` - +``` +set @@global.tidb_distsql_scan_concurrency = 10 +``` + ### tidb_snapshot - Scope: SESSION - Default value: "" -- This variable is used to set the time point at which the data is read by the session. For example, when you set the variable to "2017-11-11 20:20:20", the current session reads the data of this moment. +- This variable is used to set the time point at which the data is read by the session. For example, when you set the variable to "2017-11-11 20:20:20" or a TSO number like "400036290571534337", the current session reads the data of this moment. ### tidb_import_data -- Scope: SESSION | GLOBAl +- Scope: SESSION - Default value: 0 - This variable indicates whether to import data from the dump file currently. - To speed up importing, the unique index constraint is not checked when the variable is set to 1. @@ -110,6 +116,26 @@ If you need to set the global variable, run: - This variable is used to set the concurrency of the `serial scan` operation. - Use a bigger value in OLAP scenarios, and a smaller value in OLTP scenarios. +### tidb_projection_concurrency + +- Scope: SESSION | GLOBAL +- Default value: 4 +- This variable is used to set the concurrency of the `Projection` operator. + +### tidb_hashagg_partial_concurrency + +- Scope: SESSION | GLOBAL +- Default value: 4 +- This variable is used to set the concurrency of executing the concurrent `hash aggregation` algorithm in the `partial` phase. +- When the parameter of the aggregate function is not distinct, `HashAgg` is run concurrently and respectively in two phases - the `partial` phase and the `final` phase. + +### tidb_hashagg_final_concurrency + +- Scope: SESSION | GLOBAL +- Default value: 4 +- This variable is used to set the concurrency of executing the concurrent `hash aggregation` algorithm in the `final` phase. +- When the parameter of the aggregate function is not distinct, `HashAgg` is run concurrently and respectively in two phases - the `partial` phase and the `final` phase. + ### tidb_index_join_batch_size - Scope: SESSION | GLOBAL @@ -126,21 +152,21 @@ If you need to set the global variable, run: ### tidb_batch_insert -- Scope: SESSION | GLOBAL +- Scope: SESSION - Default value: 0 -- This variable is used to set whether to divide the inserted data automatically. +- This variable is used to set whether to divide the inserted data automatically. It is valid only when `autocommit` is enabled. - When inserting a large amount of data, you can set the variable value to true. Then the inserted data is automatically divided into multiple batches and each batch is inserted by a single transaction. ### tidb_batch_delete -- Scope: SESSION | GLOBAL +- Scope: SESSION - Default value: 0 -- This variable is used to set whether to divide the data for deletion automatically. -- When deleting a large amount of data, you can set the variable value to true. Then the data for deletion is automatically divided into multiple batches and each batch is deleted by a single transaction. +- This variable is used to set whether to divide the data for deletion automatically. It is valid only when `autocommit` is enabled. +- When deleting a large amount of data, you can set the variable value to true. Then the data for deletion is automatically divided into multiple batches and each batch is deleted by a single transaction. ### tidb_dml_batch_size -- Scope: SESSION | GLOBAL +- Scope: SESSION - Default value: 20000 - This variable is used to set the automatically divided batch size of the data for insertion/deletion. It is only valid when `tidb_batch_insert` or `tidb_batch_delete` is enabled. - When the data size of a single row is very large, the overall data size of 20 thousand rows exceeds the size limit for a single transaction. In this case, set the variable to a smaller value. @@ -218,27 +244,78 @@ If you need to set the global variable, run: - Scope: SERVER - Default value: 0 - This variable is used to set whether to enable Streaming. - -## Optimizer hint + +### tidb_retry_limit + +- Scope: SESSION | GLOBAL +- Default value: 10 +- When a transaction encounters retriable errors, such as transaction conflicts and TiKV busy, this transaction can be re-executed. This variable is used to set the maximum number of the retries. + +### tidb_disable_txn_auto_retry + +- Scope: SESSION | GLOBAL +- Default: 0 +- This variable is used to set whether to disable automatic retry of explicit transactions. If you set this variable to 1, the transaction does not retry automatically. If there is a conflict, the transaction needs to be retried at the application layer. To decide whether you need to disable automatic retry, see [description of optimistic transactions](../sql/transaction-isolation.md#description-of-optimistic-transactions). + +### tidb_enable_table_partition + +- Scope: SESSION +- Default value: 0 +- This variable is used to set whether to enable the `TABLE PARTITION` feature. + +### tidb_backoff_lock_fast + +- Scope: SESSION | GLOBAL +- Default value: 100 +- This variable is used to set the `backoff` time when the read request meets a lock. + +### tidb_ddl_reorg_worker_cnt + +- Scope: SESSION | GLOBAL +- Default value: 16 +- This variable is used to set the concurrency of the DDL operation in the `re-organize` phase. + +### tidb_ddl_reorg_priority + +- Scope: SESSION | GLOBAL +- Default value: `PRIORITY_LOW` +- This variable is used to set the priority of executing the `ADD INDEX` operation in the `re-organize` phase. +- You can set the value of this variable to `PRIORITY_LOW`, `PRIORITY_NORMAL` or `PRIORITY_HIGH`. + +### tidb_force_priority + +- Scope: SESSION +- Default value: `NO_PRIORITY` +- This variable is used to change the default priority for statements executed on a TiDB server. A use case is to ensure that a particular user that is performing OLAP queries receives lower priority than users performing OLTP queries. +- You can set the value of this variable to `NO_PRIORITY`, `LOW_PRIORITY`, `DELAYED` or `HIGH_PRIORITY`. + +## Optimizer Hint + On the basis of MySQL’s `Optimizer Hint` Syntax, TiDB adds some proprietary `Hint` syntaxes. When using the `Hint` syntax, the TiDB optimizer will try to use the specific algorithm, which performs better than the default algorithm in some scenarios. The `Hint` syntax is included in comments like `/*+ xxx */`, and in MySQL client versions earlier than 5.7.7, the comment is removed by default. If you want to use the `Hint` syntax in these earlier versions, add the `--comments` option when starting the client. For example: `mysql -h 127.0.0.1 -P 4000 -uroot --comments`. ### TIDB_SMJ(t1, t2) -```SELECT /*+ TIDB_SMJ(t1, t2) */ * from t1,t2 where t1.id = t2.id``` +```sql +SELECT /*+ TIDB_SMJ(t1, t2) */ * from t1, t2 where t1.id = t2.id +``` This variable is used to remind the optimizer to use the `Sort Merge Join` algorithm. This algorithm takes up less memory, but takes longer to execute. It is recommended if the data size is too large, or there’s insufficient system memory. ### TIDB_INLJ(t1, t2) - -```SELECT /*+ TIDB_INLJ(t1, t2) */ * from t1,t2 where t1.id = t2.id``` + +```sql +SELECT /*+ TIDB_INLJ(t1, t2) */ * from t1, t2 where t1.id = t2.id +``` This variable is used to remind the optimizer to use the `Index Nested Loop Join` algorithm. In some scenarios, this algorithm runs faster and takes up fewer system resources, but may be slower and takes up more system resources in some other scenarios. You can try to use this algorithm in scenarios where the result-set is less than 10,000 rows after the outer table is filtered by the WHERE condition. The parameter in `TIDB_INLJ()` is the candidate table for the driving table (external table) when generating the query plan. That means, `TIDB_INLJ (t1)` will only consider using t1 as the driving table to create a query plan. ### TIDB_HJ(t1, t2) -```SELECT /*+ TIDB_HJ(t1, t2) */ * from t1,t2 where t1.id = t2.id``` +```sql +SELECT /*+ TIDB_HJ(t1, t2) */ * from t1, t2 where t1.id = t2.id +``` This variable is used to remind the optimizer to use the `Hash Join` algorithm. This algorithm executes threads concurrently. It runs faster but takes up more memory. @@ -266,4 +343,4 @@ To mitigate the hot spot issue, you can configure `SHARD_ROW_ID_BITS`. The ROW I Usage of statements: - `CREATE TABLE`: `CREATE TABLE t (c int) SHARD_ROW_ID_BITS = 4;` -- `ALTER TABLE`: `ALTER TABLE MODIFY t SHARD_ROW_ID_BITS = 4;` +- `ALTER TABLE`: `ALTER TABLE t SHARD_ROW_ID_BITS = 4;` diff --git a/sql/time-zone.md b/sql/time-zone.md index ad213984a76a3..36d9096c1612e 100644 --- a/sql/time-zone.md +++ b/sql/time-zone.md @@ -1,24 +1,30 @@ --- -title: Time Zone +title: Time Zone Support +summary: Learn how to set the time zone and its format. category: user guide --- -# Time Zone +# Time Zone Support -The time zone in TiDB is decided by the global `time_zone` system variable and the session `time_zone` system variable. The initial value for `time_zone` is 'SYSTEM', which indicates that the server time zone is the same as the system time zone. +The time zone in TiDB is decided by the global `time_zone` system variable and the session `time_zone` system variable. The default value of `time_zone` is `SYSTEM`. The actual time zone corresponding to `System` is configured when the TiDB cluster bootstrap is initialized. The detailed logic is as follows: + +- Prioritize the use of the `TZ` environment variable. +- If the `TZ` environment variable fails, extract the time zone from the actual soft link address of `/etc/localtime`. +- If both of the above methods fail, use `UTC` as the system time zone. You can use the following statement to set the global server `time_zone` value at runtime: ```sql mysql> SET GLOBAL time_zone = timezone; ``` + Each client has its own time zone setting, given by the session `time_zone` variable. Initially, the session variable takes its value from the global `time_zone` variable, but the client can change its own time zone with this statement: ```sql mysql> SET time_zone = timezone; ``` -You can use the following statment to view the current values of the global and client-specific time zones: +You can use the following statement to view the current values of the global and client-specific time zones: ```sql mysql> SELECT @@global.time_zone, @@session.time_zone; diff --git a/sql/transaction-isolation.md b/sql/transaction-isolation.md index 80b86c05ef8cb..c81c6c616f548 100644 --- a/sql/transaction-isolation.md +++ b/sql/transaction-isolation.md @@ -1,5 +1,6 @@ --- title: TiDB Transaction Isolation Levels +summary: Learn about the transaction isolation levels in TiDB. category: user guide --- @@ -16,25 +17,13 @@ The SQL-92 standard defines four levels of transaction isolation: Read Uncommitt | Repeatable Read | Not possible | Not possible | Not possible in TiDB | Possible | | Serializable | Not possible | Not possible | Not possible | Not possible | -TiDB offers two transaction isolation levels: Read Committed and Repeatable Read. +TiDB offers the Repeatable Read isolation level. TiDB uses the [Percolator transaction model](https://research.google.com/pubs/pub36726.html). A global read timestamp is obtained when the transaction is started, and a global commit timestamp is obtained when the transaction is committed. The execution order of transactions is confirmed based on the timestamps. To know more about the implementation of TiDB transaction model, see [MVCC in TiKV](https://pingcap.com/blog/2016-11-17-mvcc-in-tikv/). -Use the following command to set the isolation level of the Session or Global transaction: - -``` -SET [SESSION | GLOBAL] TRANSACTION ISOLATION LEVEL [read committed|repeatable read] -``` - -If you do not use the Session or Global keyword, this statement takes effect only for the transaction to be executed next, but not for the entire session or global transaction. - -``` -SET TRANSACTION ISOLATION LEVEL [read committed|repeatable read] -``` - ## Repeatable Read -Repeatable Read is the default transaction isolation level in TiDB. The Repeatable Read isolation level only sees data committed before the transaction begins, and it never sees either uncommitted data or changes committed during transaction execution by concurrent transactions. However, the transaction statement does see the effects of previous updates executed within its own transaction, even though they are not yet committed. +The Repeatable Read isolation level only sees data committed before the transaction begins, and it never sees either uncommitted data or changes committed during transaction execution by concurrent transactions. However, the transaction statement does see the effects of previous updates executed within its own transaction, even though they are not yet committed. For transactions running on different nodes, the start and commit order depends on the order that the timestamp is obtained from PD. @@ -61,12 +50,6 @@ The Repeatable Read isolation level in TiDB differs from that in MySQL. The MySQ The MySQL Repeatable Read isolation level is not the snapshot isolation level. The consistency of MySQL Repeatable Read isolation level is weaker than both the snapshot isolation level and TiDB Repeatable Read isolation level. -## Read Committed - -The Read Committed isolation level differs from Repeatable Read isolation level. Read Committed only guarantees the uncommitted data cannot be read. - -**Note:** Because the transaction commit is a dynamic process, the Read Committed isolation level might read the data committed by part of the transaction. It is not recommended to use the Read Committed isolation level in a database that requires strict consistency. - ## Transaction retry For the `insert/delete/update` operation, if the transaction fails and can be retried according to the system, the transaction is automatically retried within the system. @@ -80,6 +63,50 @@ You can control the number of retries by configuring the `retry-limit` parameter retry-limit = 10 ``` +## Description of optimistic transactions + +Because TiDB uses the optimistic transaction model, the final result might not be as expected if the transactions created by the explicit `BEGIN` statement automatically retry after meeting a conflict. + +Example 1: + +| Session1 | Session2 | +| ---------------- | ------------ | +| `begin;` | `begin;` | +| `select balance from t where id = 1;` | `update t set balance = balance -100 where id = 1;` | +| | `update t set balance = balance -100 where id = 2;` | +| // the subsequent logic depends on the result of `select` | `commit;` | +| `if balance > 100 {` | | +| `update t set balance = balance + 100 where id = 2;` | | +| `}` | | +| `commit;` // automatic retry | | + +Example 2: + +| Session1 | Session2 | +| ---------------- | ------------ | +| `begin;` | `begin;` | +| `update t set balance = balance - 100 where id = 1;` | `delete t where id = 1;` | +| | `commit;` | +| // the subsequent logic depends on the result of `affected_rows` | | +| `if affected_rows > 100 {` | | +| `update t set balance = balance + 100 where id = 2;` | | +| `}` | | +| `commit;` // automatic retry | | + +Under the automatic retry mechanism of TiDB, all the executed statements for the first time are re-executed again. When whether the subsequent statements are to be executed or not depends on the results of the previous statements, automatic retry cannot guarantee the final result is as expected. + +To disable the automatic retry of explicit transactions, configure the `tidb_disable_txn_auto_retry` global variable: + +``` +set @@global.tidb_disable_txn_auto_retry = 1; +``` + +This variable does not affect the implicit single statement with `auto_commit = 1`, so this type of statement still automatically retries. + +After the automatic retry of explicit transactions is disabled, if a transaction conflict occurs, the `commit` statement returns an error that includes the `try again later` string. The application layer uses this string to judge whether the error can be retried. + +If the application layer logic is included in the process of transaction execution, it is recommended to add the retry of explicit transactions at the application layer and disable automatic retry. + ## Statement rollback If you execute a statement within a transaction, the statement does not take effect when an error occurs. diff --git a/sql/transaction.md b/sql/transaction.md index 8033dd31eaa35..b95d5c798ae4f 100644 --- a/sql/transaction.md +++ b/sql/transaction.md @@ -1,5 +1,6 @@ --- title: Transactions +summary: Learn how to use the distributed transaction statements. category: user guide --- @@ -19,7 +20,7 @@ If you set the value of `autocommit` to 1, the status of the current Session is In the autocommit status, the updates are automatically committed to the database after you run each statement. Otherwise, the updates are only committed when you run the `COMMIT` or `BEGIN` statement. -Besides, autocommit is also a System Variable. You can update the current Session or the Global value using the following variable assignment statement: +`autocommit` is also a System Variable. You can update the current Session or the Global value using the following variable assignment statement: ```sql SET @@SESSION.autocommit = {0 | 1}; diff --git a/sql/type-conversion-in-expression-evaluation.md b/sql/type-conversion-in-expression-evaluation.md index 698c9ed6517da..e17bae0935b9b 100644 --- a/sql/type-conversion-in-expression-evaluation.md +++ b/sql/type-conversion-in-expression-evaluation.md @@ -1,5 +1,6 @@ --- title: Type Conversion in Expression Evaluation +summary: Learn about the type conversion in expression evaluation. category: user guide --- diff --git a/sql/understanding-the-query-execution-plan.md b/sql/understanding-the-query-execution-plan.md index f95b7c819de72..49af8cb69137f 100644 --- a/sql/understanding-the-query-execution-plan.md +++ b/sql/understanding-the-query-execution-plan.md @@ -1,5 +1,6 @@ --- title: Understand the Query Execution Plan +summary: Learn about the execution plan information returned by the `EXPLAIN` statement in TiDB. category: user guide --- @@ -12,23 +13,60 @@ Based on the details of your tables, the TiDB optimizer chooses the most efficie The result of the `EXPLAIN` statement provides information about how TiDB executes SQL queries: - `EXPLAIN` works together with `SELECT`, `DELETE`, `INSERT`, `REPLACE`, and `UPDATE`. -- When you run the `EXPLAIN` statement, TiDB returns the final physical execution plan which is optimized by the SQL statment of `EXPLAIN`. In other words, `EXPLAIN` displays the complete information about how TiDB executes the SQL statement, such as in which order, how tables are joined, and what the expression tree looks like. For more information, see [`EXPLAIN` output format](#explain-output-format). -- TiDB dose not support `EXPLAIN [options] FOR CONNECTION connection_id` currently. We'll do it in the future. For more information, see [#4351](https://github.com/pingcap/tidb/issues/4351). +- When you run the `EXPLAIN` statement, TiDB returns the final physical execution plan which is optimized by the SQL statement of `EXPLAIN`. In other words, `EXPLAIN` displays the complete information about how TiDB executes the SQL statement, such as in which order, how tables are joined, and what the expression tree looks like. For more information, see [`EXPLAIN` output format](#explain-output-format). +- TiDB does not support `EXPLAIN [options] FOR CONNECTION connection_id` currently. We'll do it in the future. For more information, see [#4351](https://github.com/pingcap/tidb/issues/4351). The results of `EXPLAIN` shed light on how to index the data tables so that the execution plan can use the index to speed up the execution of SQL statements. You can also use `EXPLAIN` to check if the optimizer chooses the optimal order to join tables. ## `EXPLAIN` output format -Currently, the `EXPLAIN` statement returns the following six columns: id, parent, children, task, operator info, and count. Each operator in the execution plan is described by the six properties. In the results returned by `EXPLAIN`, each row describes an operator. See the following table for details: +Currently, the `EXPLAIN` statement returns the following four columns: id, count, task, operator info. Each operator in the execution plan is described by the four properties. In the results returned by `EXPLAIN`, each row describes an operator. See the following table for details: | Property Name | Description | | -----| ------------- | -| id | The id of an operator, to identify the uniqueness of an operator in the entire execution plan. | -| parent | The parent of an operator. The current execution plan is like a tree structure composed of operators. The data flows from a child to its parent, and each operator has one and only one parent. | -| children | the children and the data source of an operator | +| id | The id of an operator, to identify the uniqueness of an operator in the entire execution plan. As of TiDB 2.1, the id includes formatting to show a tree structure of operators. The data flows from a child to its parent, and each operator has one and only one parent. | +| count | An estimation of the number of data items that the current operator outputs, based on the statistics and the execution logic of the operator | | task | the task that the current operator belongs to. The current execution plan contains two types of tasks: 1) the **root** task that runs on the TiDB server; 2) the **cop** task that runs concurrently on the TiKV server. The topological relations of the current execution plan in the task level is that a root task can be followed by many cop tasks. The root task uses the output of cop task as the input. The cop task executes the tasks that TiDB pushes to TiKV. Each cop task scatters in the TiKV cluster and is executed by multiple processes. | | operator info | The details about each operator. The information of each operator differs from others, see [Operator Info](#operator-info).| -| count | to predict the number of data items that the current operator outputs, based on the statistics and the execution logic of the operator | + +### Example usage + +Using the [bikeshare example database](../bikeshare-example-database.md): + +``` +mysql> EXPLAIN SELECT count(*) FROM trips WHERE start_date BETWEEN '2017-07-01 00:00:00' AND '2017-07-01 23:59:59'; ++--------------------------+-------------+------+------------------------------------------------------------------------------------------------------------------------+ +| id | count | task | operator info | ++--------------------------+-------------+------+------------------------------------------------------------------------------------------------------------------------+ +| StreamAgg_20 | 1.00 | root | funcs:count(col_0) | +| └─TableReader_21 | 1.00 | root | data:StreamAgg_9 | +| └─StreamAgg_9 | 1.00 | cop | funcs:count(1) | +| └─Selection_19 | 8166.73 | cop | ge(bikeshare.trips.start_date, 2017-07-01 00:00:00.000000), le(bikeshare.trips.start_date, 2017-07-01 23:59:59.000000) | +| └─TableScan_18 | 19117643.00 | cop | table:trips, range:[-inf,+inf], keep order:false | ++--------------------------+-------------+------+------------------------------------------------------------------------------------------------------------------------+ +5 rows in set (0.00 sec) +``` + +Here you can see that the coprocesor (cop) needs to scan the table `trips` to find rows that match the criteria of `start_date`. Rows that meet this criteria are determined in `Selection_19` and passed to `StreamAgg_9`, all still within the coprocessor (i.e. inside of TiKV). The `count` column shows an approximate number of rows that will be processed, which is estimated with the help of table statistics. In this query it is estimated that each of the TiKV nodes will return `1.00` row to TiDB (as `TableReader_21`), which are then aggregated as `StreamAgg_20` to return an estimated `1.00` row to the client. + +The good news with this query is that most of the work is pushed down to the coprocessor. This means that minimal data transfer is required for query execution. However, the `TableScan_18` can be eliminated by adding an index to speed up queries on `start_date`: + +```sql +mysql> ALTER TABLE trips ADD INDEX (start_date); +.. +mysql> EXPLAIN SELECT count(*) FROM trips WHERE start_date BETWEEN '2017-07-01 00:00:00' AND '2017-07-01 23:59:59'; ++------------------------+---------+------+--------------------------------------------------------------------------------------------------+ +| id | count | task | operator info | ++------------------------+---------+------+--------------------------------------------------------------------------------------------------+ +| StreamAgg_25 | 1.00 | root | funcs:count(col_0) | +| └─IndexReader_26 | 1.00 | root | index:StreamAgg_9 | +| └─StreamAgg_9 | 1.00 | cop | funcs:count(1) | +| └─IndexScan_24 | 8166.73 | cop | table:trips, index:start_date, range:[2017-07-01 00:00:00,2017-07-01 23:59:59], keep order:false | ++------------------------+---------+------+--------------------------------------------------------------------------------------------------+ +4 rows in set (0.01 sec) +``` + +In the revisited `EXPLAIN` you can see the count of rows scanned has reduced via the use of an index. On a reference system, the query execution time reduced from 50.41 seconds to 0.00 seconds! ## Overview @@ -46,7 +84,7 @@ Similar to the table data, the index data in TiDB is also stored in TiKV. The ke In the WHERE/HAVING/ON condition, analyze the results returned by primary key or index key queries. For example, number and date types of comparison symbols, greater than, less than, equal to, greater than or equal to, less than or equal to, and character type LIKE symbols. -TiDB only supports the comparison symbols of which one side is a column and the other side is a constant or can be calculated as a constant. Query conditions like `year(birth_day) < 1992` cannot use the index. Besides, try to use the same type to compare, to avoid that the index cannot be used because of additional cast operations. For example, in `user_id = 123456`, if the `user_id` is a string, you need to write `123456` as a string constant. +TiDB only supports the comparison symbols of which one side is a column and the other side is a constant or can be calculated as a constant. Query conditions like `year(birth_day) < 1992` cannot use the index. Try to use the same type to compare: additional cast operations prevent the index from being used. For example, in `user_id = 123456`, if the `user_id` is a string, you need to write `123456` as a string constant. Using `AND` and `OR` combination on the range query conditions of the same column is equivalent to getting the intersection or union set. For multidimensional combined indexes, you can write the conditions for multiple columns. For example, in the `(a, b, c)` combined index, when `a` is an equivalent query, you can continue to calculate the query range of `b`; when `b` is also an equivalent query, you can continue to calculate the query range of `c`; otherwise, if `a` is a non-equivalent query, you can only calculate the query range of `a`. diff --git a/sql/user-account-management.md b/sql/user-account-management.md index c4fec959690c8..c0af9fdd20421 100644 --- a/sql/user-account-management.md +++ b/sql/user-account-management.md @@ -1,10 +1,13 @@ --- title: TiDB User Account Management +summary: Learn how to manage a TiDB user account. category: user guide --- # TiDB User Account Management +This document describes how to manage a TiDB user account. + ## User names and passwords TiDB stores the user accounts in the table of the `mysql.user` system database. Each account is identified by a user name and the client host. Each account may have a password. diff --git a/sql/user-defined-variables.md b/sql/user-defined-variables.md index 2879219c103ca..821216f8666a4 100644 --- a/sql/user-defined-variables.md +++ b/sql/user-defined-variables.md @@ -1,5 +1,6 @@ --- title: User-Defined Variables +summary: Learn how to use user-defined variables. category: user guide --- @@ -8,6 +9,7 @@ category: user guide The format of the user-defined variables is `@var_name`. `@var_name` consists of alphanumeric characters, `_`, and `$`. The user-defined variables are case-insensitive. The user-defined variables are session specific, which means a user variable defined by one client cannot be seen or used by other clients. + You can use the `SET` statement to set a user variable: ```sql @@ -31,6 +33,7 @@ mysql> SELECT @a1, @a2, @t3, @a4 := @a1+@a2+@a3; | 1 | 2 | 4 | 7 | +------+------+------+--------------------+ ``` + Hexadecimal or bit values assigned to user variables are treated as binary strings in TiDB. To assign a hexadecimal or bit value as a number, use it in numeric context. For example, add `0` or use `CAST(... AS UNSIGNED)`: ```sql diff --git a/sql/user-manual.md b/sql/user-manual.md index 273e1aa626b6a..fa69f1d449366 100644 --- a/sql/user-manual.md +++ b/sql/user-manual.md @@ -1,5 +1,6 @@ --- title: TiDB User Guide +summary: Learn about the user guide of TiDB. category: user guide --- diff --git a/sql/util.md b/sql/util.md index 1fc96642ed83f..4e762686ad7b1 100644 --- a/sql/util.md +++ b/sql/util.md @@ -1,10 +1,13 @@ --- title: Utility Statements +summary: Learn how to use the utility statements, including the `DESCRIBE`, `EXPLAIN`, and `USE` statements. category: user guide --- # Utility Statements +This document describes the utility statements, including the `DESCRIBE`, `EXPLAIN`, and `USE` statements. + ## `DESCRIBE` statement The `DESCRIBE` and `EXPLAIN` statements are synonyms, which can also be abbreviated as `DESC`. See the usage of the `EXPLAIN` statement. @@ -34,7 +37,7 @@ explainable_stmt: { } ``` -For more information about the `EXPLAIN` statement, see [Understand the Query Execution Plan](understanding-the-query-execution-plan.md). +For more information about the `EXPLAIN` statement, see [Understand the Query Execution Plan](../sql/understanding-the-query-execution-plan.md). In addition to the MySQL standard result format, TiDB also supports DotGraph and you need to specify `FORMAT = "dot"` as in the following example: diff --git a/sql/variable.md b/sql/variable.md index bbe004c73d58e..d55ab2a314c1f 100644 --- a/sql/variable.md +++ b/sql/variable.md @@ -1,5 +1,6 @@ --- title: The System Variables +summary: Learn how to use the system variables in TiDB. category: user guide --- @@ -9,7 +10,7 @@ The system variables in MySQL are the system parameters that modify the operatio ## Set the system variables -You can use the [`SET`](admin.md#the-set-statement) statement to change the value of the system variables. Before you change, consider the scope of the variable. For more information, see [MySQL Dynamic System Variables](https://dev.mysql.com/doc/refman/5.7/en/dynamic-system-variables.html). +You can use the [`SET`](../sql/admin.md#the-set-statement) statement to change the value of the system variables. Before you change, consider the scope of the variable. For more information, see [MySQL Dynamic System Variables](https://dev.mysql.com/doc/refman/5.7/en/dynamic-system-variables.html). ### Set Global variables @@ -42,7 +43,8 @@ The following MySQL system variables are fully supported in TiDB and have the sa | sql_mode | GLOBAL \| SESSION | support some of the MySQL SQL modes| | time_zone | GLOBAL \| SESSION | the time zone of the database | | tx_isolation | GLOBAL \| SESSION | the isolation level of a transaction | +| hostname | NONE | the hostname of the TiDB server | ## The proprietary system variables and syntaxes in TiDB -See [The Proprietary System Variables and Syntax in TiDB](tidb-specific.md). \ No newline at end of file +See [The Proprietary System Variables and Syntax in TiDB](../sql/tidb-specific.md). diff --git a/tikv/deploy-tikv-docker-compose.md b/tikv/deploy-tikv-docker-compose.md new file mode 100644 index 0000000000000..9ee2bd75eff3d --- /dev/null +++ b/tikv/deploy-tikv-docker-compose.md @@ -0,0 +1,82 @@ +--- +title: Install and Deploy TiKV Using Docker Compose +summary: Use Docker Compose to quickly deploy a TiKV testing cluster on a single machine. +category: operations +--- + +# Install and Deploy TiKV Using Docker Compose + +This guide describes how to quickly deploy a TiKV testing cluster using [Docker Compose](https://github.com/pingcap/tidb-docker-compose/) on a single machine. + +> **Note:** Currently, this installation method only supports the Linux system. + +## Prerequisites + +Make sure you have installed the following items on your machine: + +- Docker (17.06.0 or later) and Docker Compose + + ```bash + sudo yum install docker docker-compose + ``` + +- Helm + + ```bash + curl https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get | bash + ``` + +- Git + + ``` + sudo yum install git + ``` + +## Install and deploy + +1. Download `tidb-docker-compose`. + + ```bash + git clone https://github.com/pingcap/tidb-docker-compose.git + ``` + +2. Edit the `compose/values.yaml` file to configure `networkMode` to `host`. + + ```bash + cd tidb-docker-compose + vim compose/values.yaml + ``` + +3. Edit the `compose/values.yaml` file to comment the TiDB section out. + +4. Change the Prometheus and Pushgateway addresses for the `host` network mode. + + ```bash + sed -i 's/pushgateway:9091/127.0.0.1:9091/g' config/* + sed -i 's/prometheus:9090/127.0.0.1:9090/g' config/* + ``` + +5. Generate the `generated-docker-compose.yml` file. + + ```bash + helm template compose > generated-docker-compose.yml + ``` + +6. Create and start the cluster using the `generated-docker-compose.yml` file. + + ```bash + docker-compose -f generated-docker-compose.yml pull # Get the latest Docker images + docker-compose -f generated-docker-compose.yml up -d + ``` + +You can check whether the TiKV cluster has been successfully deployed using the following command: + +```bash +curl localhost:2379/pd/api/v1/stores +``` + +If the state of all the TiKV instances is "Up", you have successfully deployed a TiKV cluster. + +## What's next? + +If you want to try the Go client, see [Try Two Types of APIs](../tikv/go-client-api.md). \ No newline at end of file diff --git a/tikv/deploy-tikv-using-ansible.md b/tikv/deploy-tikv-using-ansible.md new file mode 100644 index 0000000000000..035f0acd6162b --- /dev/null +++ b/tikv/deploy-tikv-using-ansible.md @@ -0,0 +1,565 @@ +--- +title: Install and Deploy TiKV Using Ansible +summary: Use TiDB-Ansible to deploy a TiKV cluster on multiple nodes. +category: user guide +--- + +# Install and Deploy TiKV Using Ansible + +This guide describes how to install and deploy TiKV using Ansible. Ansible is an IT automation tool that can configure systems, deploy software, and orchestrate more advanced IT tasks such as continuous deployments or zero downtime rolling updates. + +[TiDB-Ansible](https://github.com/pingcap/tidb-ansible) is a TiDB cluster deployment tool developed by PingCAP, based on Ansible playbook. TiDB-Ansible enables you to quickly deploy a new TiKV cluster which includes PD, TiKV, and the cluster monitoring modules. + +> **Note:** For the production environment, it is recommended to use TiDB-Ansible to deploy your TiDB cluster. If you only want to try TiKV out and explore the features, see [Install and Deploy TiKV using Docker Compose](../tikv/deploy-tikv-docker-compose.md) on a single machine. + +## Prepare + +Before you start, make sure you have: + +1. Several target machines that meet the following requirements: + + - 4 or more machines + + A standard TiKV cluster contains 6 machines. You can use 4 machines for testing. + + - CentOS 7.3 (64 bit) or later with Python 2.7 installed, x86_64 architecture (AMD64) + - Network between machines + + > **Note:** When you deploy TiKV using Ansible, use SSD disks for the data directory of TiKV and PD nodes. Otherwise, it cannot pass the check. For more details, see [Software and Hardware Requirements](../op-guide/recommendation.md). + +2. A Control Machine that meets the following requirements: + + > **Note:** The Control Machine can be one of the target machines. + + - CentOS 7.3 (64 bit) or later with Python 2.7 installed + - Access to the Internet + - Git installed + +## Step 1: Install system dependencies on the Control Machine + +Log in to the Control Machine using the `root` user account, and run the corresponding command according to your operating system. + +- If you use a Control Machine installed with CentOS 7, run the following command: + + ``` + # yum -y install epel-release git curl sshpass + # yum -y install python-pip + ``` + +- If you use a Control Machine installed with Ubuntu, run the following command: + + ``` + # apt-get -y install git curl sshpass python-pip + ``` + +## Step 2: Create the `tidb` user on the Control Machine and generate the SSH key + +Make sure you have logged in to the Control Machine using the `root` user account, and then run the following command. + +1. Create the `tidb` user. + + ``` + # useradd -m -d /home/tidb tidb + ``` + +2. Set a password for the `tidb` user account. + + ``` + # passwd tidb + ``` + +3. Configure sudo without password for the `tidb` user account by adding `tidb ALL=(ALL) NOPASSWD: ALL` to the end of the sudo file: + + ``` + # visudo + tidb ALL=(ALL) NOPASSWD: ALL + ``` +4. Generate the SSH key. + + Execute the `su` command to switch the user from `root` to `tidb`. Create the SSH key for the `tidb` user account and hit the Enter key when `Enter passphrase` is prompted. After successful execution, the SSH private key file is `/home/tidb/.ssh/id_rsa`, and the SSH public key file is `/home/tidb/.ssh/id_rsa.pub`. + + ``` + # su - tidb + $ ssh-keygen -t rsa + Generating public/private rsa key pair. + Enter file in which to save the key (/home/tidb/.ssh/id_rsa): + Created directory '/home/tidb/.ssh'. + Enter passphrase (empty for no passphrase): + Enter same passphrase again: + Your identification has been saved in /home/tidb/.ssh/id_rsa. + Your public key has been saved in /home/tidb/.ssh/id_rsa.pub. + The key fingerprint is: + SHA256:eIBykszR1KyECA/h0d7PRKz4fhAeli7IrVphhte7/So tidb@172.16.10.49 + The key's randomart image is: + +---[RSA 2048]----+ + |=+o+.o. | + |o=o+o.oo | + | .O.=.= | + | . B.B + | + |o B * B S | + | * + * + | + | o + . | + | o E+ . | + |o ..+o. | + +----[SHA256]-----+ + ``` + +## Step 3: Download TiDB-Ansible to the Control Machine + +1. Log in to the Control Machine using the `tidb` user account and enter the `/home/tidb` directory. + +2. Download the corresponding TiDB-Ansible version from the [TiDB-Ansible project](https://github.com/pingcap/tidb-ansible). The default folder name is `tidb-ansible`. + + - Download the 2.0 GA version: + + ```bash + $ git clone -b release-2.0 https://github.com/pingcap/tidb-ansible.git + ``` + + - Download the master version: + + ```bash + $ git clone https://github.com/pingcap/tidb-ansible.git + ``` + + > **Note:** It is required to download `tidb-ansible` to the `/home/tidb` directory using the `tidb` user account. If you download it to the `/root` directory, a privilege issue occurs. + + If you have questions regarding which version to use, email to info@pingcap.com for more information or [file an issue](https://github.com/pingcap/tidb-ansible/issues/new). + +## Step 4: Install Ansible and its dependencies on the Control Machine + +Make sure you have logged in to the Control Machine using the `tidb` user account. + +It is required to use `pip` to install Ansible and its dependencies, otherwise a compatibility issue occurs. Currently, the TiDB 2.0 GA version and the master version are compatible with Ansible 2.4 and Ansible 2.5. + +1. Install Ansible and the dependencies on the Control Machine: + + ```bash + $ cd /home/tidb/tidb-ansible + $ sudo pip install -r ./requirements.txt + ``` + + Ansible and the related dependencies are in the `tidb-ansible/requirements.txt` file. + +2. View the version of Ansible: + + ```bash + $ ansible --version + ansible 2.5.0 + ``` + +## Step 5: Configure the SSH mutual trust and sudo rules on the Control Machine + +Make sure you have logged in to the Control Machine using the `tidb` user account. + +1. Add the IPs of your target machines to the `[servers]` section of the `hosts.ini` file. + + ```bash + $ cd /home/tidb/tidb-ansible + $ vi hosts.ini + [servers] + 172.16.10.1 + 172.16.10.2 + 172.16.10.3 + 172.16.10.4 + 172.16.10.5 + 172.16.10.6 + + [all:vars] + username = tidb + ntp_server = pool.ntp.org + ``` + +2. Run the following command and input the `root` user account password of your target machines. + + ```bash + $ ansible-playbook -i hosts.ini create_users.yml -u root -k + ``` + + This step creates the `tidb` user account on the target machines, and configures the sudo rules and the SSH mutual trust between the Control Machine and the target machines. + +> **Note:** To configure the SSH mutual trust and sudo without password manually, see [How to manually configure the SSH mutual trust and sudo without password](../op-guide/ansible-deployment.md#how-to-manually-configure-the-ssh-mutual-trust-and-sudo-without-password). + +## Step 6: Install the NTP service on the target machines + +> **Note:** If the time and time zone of all your target machines are same, the NTP service is on and is normally synchronizing time, you can ignore this step. See [How to check whether the NTP service is normal](../op-guide/ansible-deployment.md#how-to-check-whether-the-ntp-service-is-normal). + +Make sure you have logged in to the Control Machine using the `tidb` user account, run the following command: + +```bash +$ cd /home/tidb/tidb-ansible +$ ansible-playbook -i hosts.ini deploy_ntp.yml -u tidb -b +``` + +The NTP service is installed and started using the software repository that comes with the system on the target machines. The default NTP server list in the installation package is used. The related `server` parameter is in the `/etc/ntp.conf` configuration file. + +To make the NTP service start synchronizing as soon as possible, the system executes the `ntpdate` command to set the local date and time by polling `ntp_server` in the `hosts.ini` file. The default server is `pool.ntp.org`, and you can also replace it with your NTP server. + +## Step 7: Configure the CPUfreq governor mode on the target machine + +For details about CPUfreq, see [the CPUfreq Governor documentation](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/power_management_guide/cpufreq_governors). + +Set the CPUfreq governor mode to `performance` to make full use of CPU performance. + +### Check the governor modes supported by the system + +You can run the `cpupower frequency-info --governors` command to check the governor modes which the system supports: + +``` +# cpupower frequency-info --governors +analyzing CPU 0: + available cpufreq governors: performance powersave +``` + +Taking the above code for example, the system supports the `performance` and `powersave` modes. + +> **Note:** As the following shows, if it returns "Not Available", it means that the current system does not support CPUfreq configuration and you can skip this step. + +``` +# cpupower frequency-info --governors +analyzing CPU 0: + available cpufreq governors: Not Available +``` + +### Check the current governor mode + +You can run the `cpupower frequency-info --policy` command to check the current CPUfreq governor mode: + +``` +# cpupower frequency-info --policy +analyzing CPU 0: + current policy: frequency should be within 1.20 GHz and 3.20 GHz. + The governor "powersave" may decide which speed to use + within this range. +``` + +As the above code shows, the current mode is `powersave` in this example. + +### Change the governor mode + +- You can run the following command to change the current mode to `performance`: + + ``` + # cpupower frequency-set --governor performance + ``` + +- You can also run the following command to set the mode on the target machine in batches: + + ``` + $ ansible -i hosts.ini all -m shell -a "cpupower frequency-set --governor performance" -u tidb -b + ``` + +## Step 8: Mount the data disk ext4 filesystem with options on the target machines + +Log in to the Control Machine using the `root` user account. + +Format your data disks to the ext4 filesystem and mount the filesystem with the `nodelalloc` and `noatime` options. It is required to mount the `nodelalloc` option, or else the Ansible deployment cannot pass the test. The `noatime` option is optional. + +> **Note:** If your data disks have been formatted to ext4 and have mounted the options, you can uninstall it by running the `# umount /dev/nvme0n1` command, follow the steps starting from editing the `/etc/fstab` file, and remount the filesystem with options. + +Take the `/dev/nvme0n1` data disk as an example: + +1. View the data disk. + + ``` + # fdisk -l + Disk /dev/nvme0n1: 1000 GB + ``` + +2. Create the partition table. + + ``` + # parted -s -a optimal /dev/nvme0n1 mklabel gpt -- mkpart primary ext4 1 -1 + ``` + +3. Format the data disk to the ext4 filesystem. + + ``` + # mkfs.ext4 /dev/nvme0n1 + ``` + +4. View the partition UUID of the data disk. + + In this example, the UUID of `nvme0n1` is `c51eb23b-195c-4061-92a9-3fad812cc12f`. + + ``` + # lsblk -f + NAME FSTYPE LABEL UUID MOUNTPOINT + sda + ├─sda1 ext4 237b634b-a565-477b-8371-6dff0c41f5ab /boot + ├─sda2 swap f414c5c0-f823-4bb1-8fdf-e531173a72ed + └─sda3 ext4 547909c1-398d-4696-94c6-03e43e317b60 / + sr0 + nvme0n1 ext4 c51eb23b-195c-4061-92a9-3fad812cc12f + ``` + +5. Edit the `/etc/fstab` file and add the mount options. + + ``` + # vi /etc/fstab + UUID=c51eb23b-195c-4061-92a9-3fad812cc12f /data1 ext4 defaults,nodelalloc,noatime 0 2 + ``` + +6. Mount the data disk. + + ``` + # mkdir /data1 + # mount -a + ``` + +7. Check using the following command. + + ``` + # mount -t ext4 + /dev/nvme0n1 on /data1 type ext4 (rw,noatime,nodelalloc,data=ordered) + ``` + + If the filesystem is ext4 and `nodelalloc` is included in the mount options, you have successfully mount the data disk ext4 filesystem with options on the target machines. + +## Step 9: Edit the `inventory.ini` file to orchestrate the TiKV cluster + +Edit the `tidb-ansible/inventory.ini` file to orchestrate the TiKV cluster. The standard TiKV cluster contains 6 machines: 3 PD nodes and 3 TiKV nodes. + +- Deploy at least 3 instances for TiKV. +- Do not deploy TiKV together with PD on the same machine. +- Use the first PD machine as the monitoring machine. + +> **Note:** +> +> - Leave `[tidb_servers]` in the `inventory.ini` file empty, because this deployment is for the TiKV cluster, not the TiDB cluster. +> - It is required to use the internal IP address to deploy. If the SSH port of the target machines is not the default 22 port, you need to add the `ansible_port` variable. For example, `TiDB1 ansible_host=172.16.10.1 ansible_port=5555`. + +You can choose one of the following two types of cluster topology according to your scenario: + +- [The cluster topology of a single TiKV instance on each TiKV node](#option-1-use-the-cluster-topology-of-a-single-tikv-instance-on-each-tikv-node) + + In most cases, it is recommended to deploy one TiKV instance on each TiKV node for better performance. However, if the CPU and memory of your TiKV machines are much better than the required in [Hardware and Software Requirements](../op-guide/recommendation.md), and you have more than two disks in one node or the capacity of one SSD is larger than 2 TB, you can deploy no more than 2 TiKV instances on a single TiKV node. + +- [The cluster topology of multiple TiKV instances on each TiKV node](#option-2-use-the-cluster-topology-of-multiple-tikv-instances-on-each-tikv-node) + +### Option 1: Use the cluster topology of a single TiKV instance on each TiKV node + +| Name | Host IP | Services | +|-------|-------------|----------| +| node1 | 172.16.10.1 | PD1 | +| node2 | 172.16.10.2 | PD2 | +| node3 | 172.16.10.3 | PD3 | +| node4 | 172.16.10.4 | TiKV1 | +| node5 | 172.16.10.5 | TiKV2 | +| node6 | 172.16.10.6 | TiKV3 | + +Edit the `inventory.ini` file as follows: + +```ini +[tidb_servers] + +[pd_servers] +172.16.10.1 +172.16.10.2 +172.16.10.3 + +[tikv_servers] +172.16.10.4 +172.16.10.5 +172.16.10.6 + +[monitoring_servers] +172.16.10.1 + +[grafana_servers] +172.16.10.1 + +[monitored_servers] +172.16.10.1 +172.16.10.2 +172.16.10.3 +172.16.10.4 +172.16.10.5 +172.16.10.6 +``` + +### Option 2: Use the cluster topology of multiple TiKV instances on each TiKV node + +Take two TiKV instances on each TiKV node as an example: + +| Name | Host IP | Services | +|-------|-------------|------------------| +| node1 | 172.16.10.1 | PD1 | +| node2 | 172.16.10.2 | PD2 | +| node3 | 172.16.10.3 | PD3 | +| node4 | 172.16.10.4 | TiKV1-1, TiKV1-2 | +| node5 | 172.16.10.5 | TiKV2-1, TiKV2-2 | +| node6 | 172.16.10.6 | TiKV3-1, TiKV3-2 | + +```ini +[tidb_servers] + +[pd_servers] +172.16.10.1 +172.16.10.2 +172.16.10.3 + +[tikv_servers] +TiKV1-1 ansible_host=172.16.10.4 deploy_dir=/data1/deploy tikv_port=20171 labels="host=tikv1" +TiKV1-2 ansible_host=172.16.10.4 deploy_dir=/data2/deploy tikv_port=20172 labels="host=tikv1" +TiKV2-1 ansible_host=172.16.10.5 deploy_dir=/data1/deploy tikv_port=20171 labels="host=tikv2" +TiKV2-2 ansible_host=172.16.10.5 deploy_dir=/data2/deploy tikv_port=20172 labels="host=tikv2" +TiKV3-1 ansible_host=172.16.10.6 deploy_dir=/data1/deploy tikv_port=20171 labels="host=tikv3" +TiKV3-2 ansible_host=172.16.10.6 deploy_dir=/data2/deploy tikv_port=20172 labels="host=tikv3" + +[monitoring_servers] +172.16.10.1 + +[grafana_servers] +172.16.10.1 + +[monitored_servers] +172.16.10.1 +172.16.10.2 +172.16.10.3 +172.16.10.4 +172.16.10.5 +172.16.10.6 + +... + +[pd_servers:vars] +location_labels = ["host"] +``` + +Edit the parameters in the service configuration file: + +1. For the cluster topology of multiple TiKV instances on each TiKV node, you need to edit the `block-cache-size` parameter in `tidb-ansible/conf/tikv.yml`: + + - `rocksdb defaultcf block-cache-size(GB)`: MEM * 80% / TiKV instance number * 30% + - `rocksdb writecf block-cache-size(GB)`: MEM * 80% / TiKV instance number * 45% + - `rocksdb lockcf block-cache-size(GB)`: MEM * 80% / TiKV instance number * 2.5% (128 MB at a minimum) + - `raftdb defaultcf block-cache-size(GB)`: MEM * 80% / TiKV instance number * 2.5% (128 MB at a minimum) + +2. For the cluster topology of multiple TiKV instances on each TiKV node, you need to edit the `high-concurrency`, `normal-concurrency` and `low-concurrency` parameters in the `tidb-ansible/conf/tikv.yml` file: + + ``` + readpool: + coprocessor: + # Notice: if CPU_NUM > 8, default thread pool size for coprocessors + # will be set to CPU_NUM * 0.8. + # high-concurrency: 8 + # normal-concurrency: 8 + # low-concurrency: 8 + ``` + + Recommended configuration: `number of instances * parameter value = CPU_Vcores * 0.8`. + +3. If multiple TiKV instances are deployed on a same physical disk, edit the `capacity` parameter in `conf/tikv.yml`: + + - `capacity`: total disk capacity / number of TiKV instances (the unit is GB) + +## Step 10: Edit variables in the `inventory.ini` file + +1. Edit the `deploy_dir` variable to configure the deployment directory. + + The global variable is set to `/home/tidb/deploy` by default, and it applies to all services. If the data disk is mounted on the `/data1` directory, you can set it to `/data1/deploy`. For example: + + ```bash + ## Global variables + [all:vars] + deploy_dir = /data1/deploy + ``` + + **Note:** To separately set the deployment directory for a service, you can configure the host variable while configuring the service host list in the `inventory.ini` file. It is required to add the first column alias, to avoid confusion in scenarios of mixed services deployment. + + ```bash + TiKV1-1 ansible_host=172.16.10.4 deploy_dir=/data1/deploy + ``` + +2. Set the `deploy_without_tidb` variable to `True`. + + ```bash + deploy_without_tidb = True + ``` + +> **Note:** If you need to edit other variables, see [the variable description table](../op-guide/ansible-deployment.md#edit-other-variables-optional). + +## Step 11: Deploy the TiKV cluster + +When `ansible-playbook` executes the Playbook, the default concurrent number is 5. If many target machines are deployed, you can add the `-f` parameter to specify the concurrency, such as `ansible-playbook deploy.yml -f 10`. + +The following example uses `tidb` as the user who runs the service. + +1. Check the `tidb-ansible/inventory.ini` file to make sure `ansible_user = tidb`. + + ```bash + ## Connection + # ssh via normal user + ansible_user = tidb + ``` + +2. Make sure the SSH mutual trust and sudo without password are successfully configured. + + - Run the following command and if all servers return `tidb`, then the SSH mutual trust is successfully configured: + + ```bash + ansible -i inventory.ini all -m shell -a 'whoami' + ``` + + - Run the following command and if all servers return `root`, then sudo without password of the `tidb` user is successfully configured: + + ```bash + ansible -i inventory.ini all -m shell -a 'whoami' -b + ``` + +3. Download the TiKV binary to the Control Machine. + + ```bash + ansible-playbook local_prepare.yml + ``` + +4. Initialize the system environment and modify the kernel parameters. + + ```bash + ansible-playbook bootstrap.yml + ``` + +5. Deploy the TiKV cluster. + + ```bash + ansible-playbook deploy.yml + ``` + +6. Start the TiKV cluster. + + ```bash + ansible-playbook start.yml + ``` + +You can check whether the TiKV cluster has been successfully deployed using the following command: + +```bash +curl 172.16.10.1:2379/pd/api/v1/stores +``` + +## Stop the TiKV cluster + +If you want to stop the TiKV cluster, run the following command: + +```bash +ansible-playbook stop.yml +``` + +## Destroy the TiKV cluster + +> **Warning:** Before you clean the cluster data or destroy the TiKV cluster, make sure you do not need it any more. + +- If you do not need the data any more, you can clean up the data for test using the following command: + + ``` + ansible-playbook unsafe_cleanup_data.yml + ``` + +- If you do not need the TiKV cluster any more, you can destroy it using the following command: + + ```bash + ansible-playbook unsafe_cleanup.yml + ``` + + > **Note:** If the deployment directory is a mount point, an error might be reported, but the implementation result remains unaffected. You can just ignore the error. \ No newline at end of file diff --git a/tikv/deploy-tikv-using-binary.md b/tikv/deploy-tikv-using-binary.md new file mode 100644 index 0000000000000..98cad512e2eda --- /dev/null +++ b/tikv/deploy-tikv-using-binary.md @@ -0,0 +1,149 @@ +--- +title: Install and Deploy TiKV Using Binary Files +summary: Use binary files to deploy a TiKV cluster on a single machine or on multiple nodes for testing. +category: user guide +--- + +# Install and Deploy TiKV Using Binary Files + +This guide describes how to deploy a TiKV cluster using binary files. + +- To quickly understand and try TiKV, see [Deploy the TiKV cluster on a single machine](#deploy-the-tikv-cluster-on-a-single-machine). +- To try TiKV out and explore the features, see [Deploy the TiKV cluster on multiple nodes for testing](#deploy-the-tikv-cluster-on-multiple-nodes-for-testing). + +## Deploy the TiKV cluster on a single machine + +This section describes how to deploy TiKV on a single machine installed with the Linux system. Take the following steps: + +1. Download the official binary package. + + ```bash + # Download the package. + wget https://download.pingcap.org/tidb-latest-linux-amd64.tar.gz + wget http://download.pingcap.org/tidb-latest-linux-amd64.sha256 + + # Check the file integrity. If the result is OK, the file is correct. + sha256sum -c tidb-latest-linux-amd64.sha256 + + # Extract the package. + tar -xzf tidb-latest-linux-amd64.tar.gz + cd tidb-latest-linux-amd64 + ``` + +2. Start PD. + + ```bash + ./bin/pd-server --name=pd1 \ + --data-dir=pd1 \ + --client-urls="http://127.0.0.1:2379" \ + --peer-urls="http://127.0.0.1:2380" \ + --initial-cluster="pd1=http://127.0.0.1:2380" \ + --log-file=pd1.log + ``` + +3. Start TiKV. + + To start the 3 TiKV instances, open a new terminal tab or window, come to the `tidb-latest-linux-amd64` directory, and start the instances using the following command: + + ```bash + ./bin/tikv-server --pd-endpoints="127.0.0.1:2379" \ + --addr="127.0.0.1:20160" \ + --data-dir=tikv1 \ + --log-file=tikv1.log + + ./bin/tikv-server --pd-endpoints="127.0.0.1:2379" \ + --addr="127.0.0.1:20161" \ + --data-dir=tikv2 \ + --log-file=tikv2.log + + ./bin/tikv-server --pd-endpoints="127.0.0.1:2379" \ + --addr="127.0.0.1:20162" \ + --data-dir=tikv3 \ + --log-file=tikv3.log + ``` + +You can use the [pd-ctl](https://github.com/pingcap/pd/tree/master/pdctl) tool to verify whether PD and TiKV are successfully deployed: + +``` +./bin/pd-ctl store -d -u http://127.0.0.1:2379 +``` + +If the state of all the TiKV instances is "Up", you have successfully deployed a TiKV cluster. + +## Deploy the TiKV cluster on multiple nodes for testing + +This section describes how to deploy TiKV on multiple nodes. If you want to test TiKV with a limited number of nodes, you can use one PD instance to test the entire cluster. + +Assume that you have four nodes, you can deploy 1 PD instance and 3 TiKV instances. For details, see the following table: + +| Name | Host IP | Services | +| :-- | :-- | :------------------- | +| Node1 | 192.168.199.113 | PD1 | +| Node2 | 192.168.199.114 | TiKV1 | +| Node3 | 192.168.199.115 | TiKV2 | +| Node4 | 192.168.199.116 | TiKV3 | + +To deploy a TiKV cluster with multiple nodes for test, take the following steps: + +1. Download the official binary package on each node. + + ```bash + # Download the package. + wget https://download.pingcap.org/tidb-latest-linux-amd64.tar.gz + wget http://download.pingcap.org/tidb-latest-linux-amd64.sha256 + + # Check the file integrity. If the result is OK, the file is correct. + sha256sum -c tidb-latest-linux-amd64.sha256 + + # Extract the package. + tar -xzf tidb-latest-linux-amd64.tar.gz + cd tidb-latest-linux-amd64 + ``` + +2. Start PD on Node1. + + ```bash + ./bin/pd-server --name=pd1 \ + --data-dir=pd1 \ + --client-urls="http://192.168.199.113:2379" \ + --peer-urls="http://192.168.199.113:2380" \ + --initial-cluster="pd1=http://192.168.199.113:2380" \ + --log-file=pd1.log + ``` + +3. Log in and start TiKV on other nodes: Node2, Node3 and Node4. + + Node2: + + ```bash + ./bin/tikv-server --pd-endpoints="192.168.199.113:2379" \ + --addr="192.168.199.114:20160" \ + --data-dir=tikv1 \ + --log-file=tikv1.log + ``` + + Node3: + + ```bash + ./bin/tikv-server --pd-endpoints="192.168.199.113:2379" \ + --addr="192.168.199.115:20160" \ + --data-dir=tikv2 \ + --log-file=tikv2.log + ``` + + Node4: + + ```bash + ./bin/tikv-server --pd-endpoints="192.168.199.113:2379" \ + --addr="192.168.199.116:20160" \ + --data-dir=tikv3 \ + --log-file=tikv3.log + ``` + +You can use the [pd-ctl](https://github.com/pingcap/pd/tree/master/pdctl) tool to verify whether PD and TiKV are successfully deployed: + +``` +./pd-ctl store -d -u http://192.168.199.113:2379 +``` + +The result displays the store count and detailed information regarding each store. If the state of all the TiKV instances is "Up", you have successfully deployed a TiKV cluster. \ No newline at end of file diff --git a/tikv/deploy-tikv-using-docker.md b/tikv/deploy-tikv-using-docker.md new file mode 100644 index 0000000000000..e6fced5f97075 --- /dev/null +++ b/tikv/deploy-tikv-using-docker.md @@ -0,0 +1,155 @@ +--- +title: Install and Deploy TiKV Using Docker +summary: Use Docker to deploy a TiKV cluster on multiple nodes. +category: user guide +--- + +# Install and Deploy TiKV Using Docker + +This guide describes how to deploy a multi-node TiKV cluster using Docker. + +## Prerequisites + +Make sure that Docker is installed on each machine. + +For more details about prerequisites, see [Hardware and Software Requirements](../op-guide/recommendation.md). + +## Deploy the TiKV cluster on multiple nodes + +Assume that you have 6 machines with the following details: + +| Name | Host IP | Services | Data Path | +| --------- | ------------- | ---------- | --------- | +| Node1 | 192.168.1.101 | PD1 | /data | +| Node2 | 192.168.1.102 | PD2 | /data | +| Node3 | 192.168.1.103 | PD3 | /data | +| Node4 | 192.168.1.104 | TiKV1 | /data | +| Node5 | 192.168.1.105 | TiKV2 | /data | +| Node6 | 192.168.1.106 | TiKV3 | /data | + +If you want to test TiKV with a limited number of nodes, you can also use one PD instance to test the entire cluster. + +### Step 1: Pull the latest images of TiKV and PD from Docker Hub + +Start Docker and pull the latest images of TiKV and PD from [Docker Hub](https://hub.docker.com) using the following command: + +```bash +docker pull pingcap/tikv:latest +docker pull pingcap/pd:latest +``` + +### Step 2: Log in and start PD + +Log in to the three PD machines and start PD respectively: + +1. Start PD1 on Node1: + + ```bash + docker run -d --name pd1 \ + -p 2379:2379 \ + -p 2380:2380 \ + -v /etc/localtime:/etc/localtime:ro \ + -v /data:/data \ + pingcap/pd:latest \ + --name="pd1" \ + --data-dir="/data/pd1" \ + --client-urls="http://0.0.0.0:2379" \ + --advertise-client-urls="http://192.168.1.101:2379" \ + --peer-urls="http://0.0.0.0:2380" \ + --advertise-peer-urls="http://192.168.1.101:2380" \ + --initial-cluster="pd1=http://192.168.1.101:2380,pd2=http://192.168.1.102:2380,pd3=http://192.168.1.103:2380" + ``` + +2. Start PD2 on Node2: + + ```bash + docker run -d --name pd2 \ + -p 2379:2379 \ + -p 2380:2380 \ + -v /etc/localtime:/etc/localtime:ro \ + -v /data:/data \ + pingcap/pd:latest \ + --name="pd2" \ + --data-dir="/data/pd2" \ + --client-urls="http://0.0.0.0:2379" \ + --advertise-client-urls="http://192.168.1.102:2379" \ + --peer-urls="http://0.0.0.0:2380" \ + --advertise-peer-urls="http://192.168.1.102:2380" \ + --initial-cluster="pd1=http://192.168.1.101:2380,pd2=http://192.168.1.102:2380,pd3=http://192.168.1.103:2380" + ``` + +3. Start PD3 on Node3: + + ```bash + docker run -d --name pd3 \ + -p 2379:2379 \ + -p 2380:2380 \ + -v /etc/localtime:/etc/localtime:ro \ + -v /data:/data \ + pingcap/pd:latest \ + --name="pd3" \ + --data-dir="/data/pd3" \ + --client-urls="http://0.0.0.0:2379" \ + --advertise-client-urls="http://192.168.1.103:2379" \ + --peer-urls="http://0.0.0.0:2380" \ + --advertise-peer-urls="http://192.168.1.103:2380" \ + --initial-cluster="pd1=http://192.168.1.101:2380,pd2=http://192.168.1.102:2380,pd3=http://192.168.1.103:2380" + ``` + +### Step 3: Log in and start TiKV + +Log in to the three TiKV machines and start TiKV respectively: + +1. Start TiKV1 on Node4: + + ```bash + docker run -d --name tikv1 \ + -p 20160:20160 \ + -v /etc/localtime:/etc/localtime:ro \ + -v /data:/data \ + pingcap/tikv:latest \ + --addr="0.0.0.0:20160" \ + --advertise-addr="192.168.1.104:20160" \ + --data-dir="/data/tikv1" \ + --pd="192.168.1.101:2379,192.168.1.102:2379,192.168.1.103:2379" + ``` + +2. Start TiKV2 on Node5: + + ```bash + docker run -d --name tikv2 \ + -p 20160:20160 \ + -v /etc/localtime:/etc/localtime:ro \ + -v /data:/data \ + pingcap/tikv:latest \ + --addr="0.0.0.0:20160" \ + --advertise-addr="192.168.1.105:20160" \ + --data-dir="/data/tikv2" \ + --pd="192.168.1.101:2379,192.168.1.102:2379,192.168.1.103:2379" + ``` + +3. Start TiKV3 on Node6: + + ```bash + docker run -d --name tikv3 \ + -p 20160:20160 \ + -v /etc/localtime:/etc/localtime:ro \ + -v /data:/data \ + pingcap/tikv:latest \ + --addr="0.0.0.0:20160" \ + --advertise-addr="192.168.1.106:20160" \ + --data-dir="/data/tikv3" \ + --pd="192.168.1.101:2379,192.168.1.102:2379,192.168.1.103:2379" + ``` + +You can check whether the TiKV cluster has been successfully deployed using the following command: + +``` +curl 192.168.1.101:2379/pd/api/v1/stores +``` + +If the state of all the TiKV instances is "Up", you have successfully deployed a TiKV cluster. + +## What's next? + +If you want to try the Go client, see [Try Two Types of APIs](../tikv/go-client-api.md). \ No newline at end of file diff --git a/tikv/go-client-api.md b/tikv/go-client-api.md new file mode 100644 index 0000000000000..9a08dbea36b18 --- /dev/null +++ b/tikv/go-client-api.md @@ -0,0 +1,339 @@ +--- +title: Try Two Types of APIs +summary: Learn how to use the Raw Key-Value API and the Transactional Key-Value API in TiKV. +category: user guide +--- + +# Try Two Types of APIs + +To apply to different scenarios, TiKV provides [two types of APIs](../tikv/tikv-overview.md#two-types-of-apis) for developers: the Raw Key-Value API and the Transactional Key-Value API. This document uses two examples to guide you through how to use the two APIs in TiKV. + +The usage examples are based on the [deployment of TiKV using binary files on multiple nodes for test](../tikv/deploy-tikv-using-binary.md#deploy-the-tikv-cluster-on-multiple-nodes-for-test). You can also quickly try the two types of APIs on a single machine. + +## Try the Raw Key-Value API + +To use the Raw Key-Value API in applications developed by golang, take the following steps: + +1. Install the necessary packages. + + ```bash + go get -v -u github.com/pingcap/tidb/store/tikv + ``` + +2. Import the dependency packages. + + ```go + import ( + "fmt" + "github.com/pingcap/tidb/config" + "github.com/pingcap/tidb/store/tikv" + ) + ``` + +3. Create a Raw Key-Value client. + + ```go + cli, err := tikv.NewRawKVClient([]string{"192.168.199.113:2379"}, config.Security{}) + ``` + + Description of two parameters in the above command: + + - `string`: a list of PD servers’ addresses + - `config.Security`: used for establishing TLS connections, usually left empty when you do not need TLS + +4. Call the Raw Key-Value client methods to access the data on TiKV. The Raw Key-Value API contains the following methods, and you can also find them at [GoDoc](https://godoc.org/github.com/pingcap/tidb/store/tikv#RawKVClient). + + ```go + type RawKVClient struct + func (c *RawKVClient) Close() error + func (c *RawKVClient) ClusterID() uint64 + func (c *RawKVClient) Delete(key []byte) error + func (c *RawKVClient) Get(key []byte) ([]byte, error) + func (c *RawKVClient) Put(key, value []byte) error + func (c *RawKVClient) Scan(startKey []byte, limit int) (keys [][]byte, values [][]byte, err error) + ``` + +### Usage example of the Raw Key-Value API + +```go +package main + +import ( + "fmt" + + "github.com/pingcap/tidb/config" + "github.com/pingcap/tidb/store/tikv" +) + +func main() { + cli, err := tikv.NewRawKVClient([]string{"192.168.199.113:2379"}, config.Security{}) + if err != nil { + panic(err) + } + defer cli.Close() + + fmt.Printf("cluster ID: %d\n", cli.ClusterID()) + + key := []byte("Company") + val := []byte("PingCAP") + + // put key into tikv + err = cli.Put(key, val) + if err != nil { + panic(err) + } + fmt.Printf("Successfully put %s:%s to tikv\n", key, val) + + // get key from tikv + val, err = cli.Get(key) + if err != nil { + panic(err) + } + fmt.Printf("found val: %s for key: %s\n", val, key) + + // delete key from tikv + err = cli.Delete(key) + if err != nil { + panic(err) + } + fmt.Printf("key: %s deleted\n", key) + + // get key again from tikv + val, err = cli.Get(key) + if err != nil { + panic(err) + } + fmt.Printf("found val: %s for key: %s\n", val, key) +} +``` + +The result is like: + +```bash +INFO[0000] [pd] create pd client with endpoints [192.168.199.113:2379] +INFO[0000] [pd] leader switches to: http://127.0.0.1:2379, previous: +INFO[0000] [pd] init cluster id 6554145799874853483 +cluster ID: 6554145799874853483 +Successfully put Company:PingCAP to tikv +found val: PingCAP for key: Company +key: Company deleted +found val: for key: Company +``` + +RawKVClient is a client of the TiKV server and only supports the GET/PUT/DELETE/SCAN commands. The RawKVClient can be safely and concurrently accessed by multiple goroutines, as long as it is not closed. Therefore, for one process, one client is enough generally. + +## Try the Transactional Key-Value API + +The Transactional Key-Value API is more complicated than the Raw Key-Value API. Some transaction related concepts are listed as follows. For more details, see the [KV package](https://github.com/pingcap/tidb/tree/master/kv). + +- Storage + + Like the RawKVClient, a Storage is an abstract TiKV cluster. + +- Snapshot + + A Snapshot is the state of a Storage at a particular point of time, which provides some readonly methods. The multiple times read from a same Snapshot is guaranteed consistent. + +- Transaction + + Like the Transaction in SQL, a Transaction symbolizes a series of read and write operations performed within the Storage. Internally, a Transaction consists of a Snapshot for reads, and a MemBuffer for all writes. The default isolation level of a Transaction is Snapshot Isolation. + +To use the Transactional Key-Value API in applications developed by golang, take the following steps: + +1. Install the necessary packages. + + ```bash + go get -v -u github.com/juju/errors + go get -v -u github.com/pingcap/tidb/kv + go get -v -u github.com/pingcap/tidb/store/tikv + go get -v -u golang.org/x/net/context + ``` + +2. Import the dependency packages. + + ```go + import ( + "flag" + "fmt" + "os" + + "github.com/juju/errors" + "github.com/pingcap/tidb/kv" + "github.com/pingcap/tidb/store/tikv" + "github.com/pingcap/tidb/terror" + + goctx "golang.org/x/net/context" + ) + ``` + +3. Create Storage using a URL scheme. + + ```go + driver := tikv.Driver{} + storage, err := driver.Open("tikv://192.168.199.113:2379") + ``` + +4. (Optional) Modify the Storage using a Transaction. + + The lifecycle of a Transaction is: _begin → {get, set, delete, scan} → {commit, rollback}_. + +5. Call the Transactional Key-Value API's methods to access the data on TiKV. The Transactional Key-Value API contains the following methods: + + ```go + Begin() -> Txn + Txn.Get(key []byte) -> (value []byte) + Txn.Set(key []byte, value []byte) + Txn.Seek(begin []byte) -> Iterator + Txn.Delete(key []byte) + Txn.Commit() + ``` + +### Usage example of the Transactional Key-Value API + +```go +package main + +import ( + "flag" + "fmt" + "os" + + "github.com/juju/errors" + "github.com/pingcap/tidb/kv" + "github.com/pingcap/tidb/store/tikv" + "github.com/pingcap/tidb/terror" + + goctx "golang.org/x/net/context" +) + +type KV struct { + K, V []byte +} + +func (kv KV) String() string { + return fmt.Sprintf("%s => %s (%v)", kv.K, kv.V, kv.V) +} + +var ( + store kv.Storage + pdAddr = flag.String("pd", "192.168.199.113:2379", "pd address:192.168.199.113:2379") +) + +// Init initializes information. +func initStore() { + driver := tikv.Driver{} + var err error + store, err = driver.Open(fmt.Sprintf("tikv://%s", *pdAddr)) + terror.MustNil(err) +} + +// key1 val1 key2 val2 ... +func puts(args ...[]byte) error { + tx, err := store.Begin() + if err != nil { + return errors.Trace(err) + } + + for i := 0; i < len(args); i += 2 { + key, val := args[i], args[i+1] + err := tx.Set(key, val) + if err != nil { + return errors.Trace(err) + } + } + err = tx.Commit(goctx.Background()) + if err != nil { + return errors.Trace(err) + } + + return nil +} + +func get(k []byte) (KV, error) { + tx, err := store.Begin() + if err != nil { + return KV{}, errors.Trace(err) + } + v, err := tx.Get(k) + if err != nil { + return KV{}, errors.Trace(err) + } + return KV{K: k, V: v}, nil +} + +func dels(keys ...[]byte) error { + tx, err := store.Begin() + if err != nil { + return errors.Trace(err) + } + for _, key := range keys { + err := tx.Delete(key) + if err != nil { + return errors.Trace(err) + } + } + err = tx.Commit(goctx.Background()) + if err != nil { + return errors.Trace(err) + } + return nil +} + +func scan(keyPrefix []byte, limit int) ([]KV, error) { + tx, err := store.Begin() + if err != nil { + return nil, errors.Trace(err) + } + it, err := tx.Seek(keyPrefix) + if err != nil { + return nil, errors.Trace(err) + } + defer it.Close() + var ret []KV + for it.Valid() && limit > 0 { + ret = append(ret, KV{K: it.Key()[:], V: it.Value()[:]}) + limit-- + it.Next() + } + return ret, nil +} + +func main() { + pdAddr := os.Getenv("PD_ADDR") + if pdAddr != "" { + os.Args = append(os.Args, "-pd", pdAddr) + } + flag.Parse() + initStore() + + // set + err := puts([]byte("key1"), []byte("value1"), []byte("key2"), []byte("value2")) + terror.MustNil(err) + + // get + kv, err := get([]byte("key1")) + terror.MustNil(err) + fmt.Println(kv) + + // scan + ret, err := scan([]byte("key"), 10) + for _, kv := range ret { + fmt.Println(kv) + } + + // delete + err = dels([]byte("key1"), []byte("key2")) + terror.MustNil(err) +} +``` + +The result is like: + +```bash +INFO[0000] [pd] create pd client with endpoints [192.168.199.113:2379] +INFO[0000] [pd] leader switches to: http://192.168.199.113:2379, previous: +INFO[0000] [pd] init cluster id 6563858376412119197 +key1 => value1 ([118 97 108 117 101 49]) +key1 => value1 ([118 97 108 117 101 49]) +key2 => value2 ([118 97 108 117 101 50]) +``` diff --git a/tikv/tikv-overview.md b/tikv/tikv-overview.md new file mode 100644 index 0000000000000..0d08ec40b974b --- /dev/null +++ b/tikv/tikv-overview.md @@ -0,0 +1,60 @@ +--- +title: Overview of TiKV +summary: Learn about the key features, architecture, and two types of APIs of TiKV. +category: overview +--- + +# Overview of TiKV + +TiKV (The pronunciation is: /'taɪkeɪvi:/ tai-K-V, etymology: titanium) is a distributed Key-Value database which is based on the design of Google Spanner and HBase, but it is much simpler without dependency on any distributed file system. + +As the storage layer of TiDB, TiKV can work separately and does not depend on the SQL layer of TiDB. To apply to different scenarios, TiKV provides [two types of APIs](#two-types-of-apis) for developers: the Raw Key-Value API and the Transactional Key-Value API. + +The key features of TiKV are as follows: + +- **Geo-Replication** + + TiKV uses [Raft](http://raft.github.io/) and the [Placement Driver](https://github.com/pingcap/pd/) to support Geo-Replication. + +- **Horizontal scalability** + + With Placement Driver and carefully designed Raft groups, TiKV excels in horizontal scalability and can easily scale to 100+ TBs of data. + +- **Consistent distributed transactions** + + Similar to Google's Spanner, TiKV supports externally-consistent distributed transactions. + +- **Coprocessor support** + + Similar to HBase, TiKV implements a Coprocessor framework to support distributed computing. + +- **Cooperates with [TiDB](https://github.com/pingcap/tidb)** + + Thanks to the internal optimization, TiKV and TiDB can work together to be a compelling database solution with high horizontal scalability, externally-consistent transactions, and support for RDMBS and NoSQL design patterns. + +## Architecture + +The TiKV server software stack is as follows: + +![The TiKV software stack](../media/tikv_stack.png) + +- **Placement Driver:** Placement Driver (PD) is the cluster manager of TiKV. PD periodically checks replication constraints to balance load and data automatically. +- **Store:** There is a RocksDB within each Store and it stores data into local disk. +- **Region:** Region is the basic unit of Key-Value data movement. Each Region is replicated to multiple Nodes. These multiple replicas form a Raft group. +- **Node:** A physical node in the cluster. Within each node, there are one or more Stores. Within each Store, there are many Regions. + +When a node starts, the metadata of the Node, Store and Region are recorded into PD. The status of each Region and Store is reported to PD regularly. + +## Two types of APIs + +TiKV provides two types of APIs for developers: + +- [The Raw Key-Value API](../tikv/go-client-api.md#try-the-raw-key-value-api) + + If your application scenario does not need distributed transactions or MVCC (Multi-Version Concurrency Control) and only need to guarantee the atomicity towards one key, you can use the Raw Key-Value API. + +- [The Transactional Key-Value API](../tikv/go-client-api.md#try-the-transactional-key-value-api) + + If your application scenario requires distributed ACID transactions and the atomicity of multiple keys within a transaction, you can use the Transactional Key-Value API. + +Compared to the Transactional Key-Value API, the Raw Key-Value API is more performant with lower latency and easier to use. \ No newline at end of file diff --git a/tispark/tispark-quick-start-guide.md b/tispark/tispark-quick-start-guide.md index ae3ba4fa7af2f..31ef395196d05 100644 --- a/tispark/tispark-quick-start-guide.md +++ b/tispark/tispark-quick-start-guide.md @@ -1,11 +1,12 @@ --- title: TiSpark Quick Start Guide +summary: Learn how to use TiSpark quickly. category: User Guide --- -# Quick Start Guide for the TiDB Connector for Spark +# TiSpark Quick Start Guide -To make it easy to try [the TiDB Connector for Spark](tispark-user-guide.md), TiDB cluster integrates Spark, TiSpark jar package and TiSpark sample data by default, in both the Pre-GA and master versions installed using TiDB-Ansible. +To make it easy to [try TiSpark](../tispark/tispark-user-guide.md), the TiDB cluster installed using TiDB-Ansible integrates Spark, TiSpark jar package and TiSpark sample data by default. ## Deployment information @@ -13,9 +14,9 @@ To make it easy to try [the TiDB Connector for Spark](tispark-user-guide.md), Ti - The TiSpark jar package is deployed by default in the `jars` folder in the Spark deployment directory. ``` - spark/jars/tispark-0.1.0-beta-SNAPSHOT-jar-with-dependencies.jar + spark/jars/tispark-SNAPSHOT-jar-with-dependencies.jar ``` - + - TiSpark sample data and import scripts are deployed by default in the TiDB-Ansible directory. ``` @@ -107,8 +108,6 @@ MySQL [TPCH_001]> show tables; ## Use example -Assume that the IP of your PD node is `192.168.0.2`, and the port is `2379`. - First start the spark-shell in the spark deployment directory: ``` diff --git a/tispark/tispark-user-guide.md b/tispark/tispark-user-guide.md index ca1596be77b26..e446cbfb49c4d 100644 --- a/tispark/tispark-user-guide.md +++ b/tispark/tispark-user-guide.md @@ -1,42 +1,42 @@ --- -title: TiDB Connector for Spark User Guide +title: TiSpark User Guide +summary: Use TiSpark to provide an HTAP solution to serve as a one-stop solution for both online transactions and analysis. category: user guide --- -# TiDB Connector for Spark User Guide +# TiSpark User Guide -The TiDB Connector for Spark is a thin layer built for running Apache Spark on top of TiDB/TiKV to answer the complex OLAP queries. It takes advantages of both the Spark platform and the distributed TiKV cluster and seamlessly glues to TiDB, the distributed OLTP database, to provide a Hybrid Transactional/Analytical Processing (HTAP) solution to serve as a one-stop solution for both online transactions and analysis. +[TiSpark](https://github.com/pingcap/tispark) is a thin layer built for running Apache Spark on top of TiDB/TiKV to answer the complex OLAP queries. It takes advantages of both the Spark platform and the distributed TiKV cluster and seamlessly glues to TiDB, the distributed OLTP database, to provide a Hybrid Transactional/Analytical Processing (HTAP) solution to serve as a one-stop solution for both online transactions and analysis. -The TiDB Connector for Spark depends on the TiKV cluster and the PD cluster. You also need to set up a Spark cluster. This document provides a brief introduction to how to setup and use the TiDB Connector for Spark. It requires some basic knowledge of Apache Spark. For more information, see [Spark website](https://spark.apache.org/docs/latest/index.html). +TiSpark depends on the TiKV cluster and the PD cluster. You also need to set up a Spark cluster. This document provides a brief introduction to how to setup and use TiSpark. It requires some basic knowledge of Apache Spark. For more information, see [Spark website](https://spark.apache.org/docs/latest/index.html). ## Overview -The TiDB Connector for Spark is an OLAP solution that runs Spark SQL directly on TiKV, the distributed storage engine. +TiSpark is an OLAP solution that runs Spark SQL directly on TiKV, the distributed storage engine. -![TiDB Connector for Spark architecture](../media/tispark-architecture.png) +![TiSpark architecture](../media/tispark-architecture.png) -+ TiDB Connector for Spark integrates with Spark Catalyst Engine deeply. It provides precise control of the computing, which allows Spark read data from TiKV efficiently. It also supports index seek, which improves the performance of the point query execution significantly. ++ TiSpark integrates with Spark Catalyst Engine deeply. It provides precise control of the computing, which allows Spark read data from TiKV efficiently. It also supports index seek, which improves the performance of the point query execution significantly. + It utilizes several strategies to push down the computing to reduce the size of dataset handling by Spark SQL, which accelerates the query execution. It also uses the TiDB built-in statistical information for the query plan optimization. -+ From the data integration point of view, TiDB Connector for Spark and TiDB serve as a solution runs both transaction and analysis directly on the same platform without building and maintaining any ETLs. It simplifies the system architecture and reduces the cost of maintenance. -+ also, you can deploy and utilize tools from the Spark ecosystem for further data processing and manipulation on TiDB. For example, using the TiDB Connector for Spark for data analysis and ETL; retrieving data from TiKV as a machine learning data source; generating reports from the scheduling system and so on. ++ From the data integration point of view, TiSpark and TiDB serve as a solution for running both transaction and analysis directly on the same platform without building and maintaining any ETLs. It simplifies the system architecture and reduces the cost of maintenance. ++ also, you can deploy and utilize tools from the Spark ecosystem for further data processing and manipulation on TiDB. For example, using TiSpark for data analysis and ETL; retrieving data from TiKV as a machine learning data source; generating reports from the scheduling system and so on. ## Environment setup -+ The current version of the TiDB Connector for Spark supports Spark 2.1. For Spark 2.0 and Spark 2.2, it has not been fully tested yet. It does not support any versions earlier than 2.0. -+ The TiDB Connector for Spark requires JDK 1.8+ and Scala 2.11 (Spark2.0 + default Scala version). -+ The TiDB Connector for Spark runs in any Spark mode such as YARN, Mesos, and Standalone. - ++ The current version of TiSpark supports Spark 2.1. For Spark 2.0 and Spark 2.2, it has not been fully tested yet. It does not support any versions earlier than 2.0. ++ TiSpark requires JDK 1.8+ and Scala 2.11 (Spark2.0 + default Scala version). ++ TiSpark runs in any Spark mode such as YARN, Mesos, and Standalone. ## Recommended configuration -### Deployment of TiKV and the TiDB Connector for Spark clusters +This section describes the configuration of independent deployment of TiKV and TiSpark, independent deployment of Spark and TiSpark, and hybrid deployment of TiKV and TiSpark. -#### Configuration of the TiKV cluster +### Configuration of independent deployment of TiKV and TiSpark + +For independent deployment of TiKV and TiSpark, it is recommended to refer to the following recommendations: -For independent deployment of TiKV and the TiDB Connector for Spark, it is recommended to refer to the following recommendations - + Hardware configuration - - For general purposes, please refer to the TiDB and TiKV hardware configuration [recommendations](https://github.com/pingcap/docs/blob/master/op-guide/recommendation.md#deployment-recommendations). + - For general purposes, please refer to the TiDB and TiKV hardware configuration [recommendations](../op-guide/recommendation.md#deployment-recommendations). - If the usage is more focused on the analysis scenarios, you can increase the memory of the TiKV nodes to at least 64G. + TiKV parameters (default) @@ -67,12 +67,11 @@ For independent deployment of TiKV and the TiDB Connector for Spark, it is recom scheduler-worker-pool-size = 4 ``` -#### Configuration of the independent deployment of the Spark cluster and the TiDB Connector for Spark cluster +### Configuration of independent deployment of Spark and TiSpark - See the [Spark official website](https://spark.apache.org/docs/latest/hardware-provisioning.html) for the detail hardware recommendations. -The following is a short overview of the TiDB Connector for Spark configuration. +The following is a short overview of TiSpark configuration. It is recommended to allocate 32G memory for Spark. Please reserve at least 25% of the memory for the operating system and buffer cache. @@ -86,61 +85,57 @@ SPARK_WORKER_MEMORY = 32g SPARK_WORKER_CORES = 8 ``` -#### Hybrid deployment configuration for the TiDB Connector for Spark and TiKV cluster +### Configuration of hybrid deployment of TiKV and TiSpark -For the hybrid deployment of the TiDB Connector for Spark and TiKV, add the TiDB Connector for Spark required resources to the TiKV reserved resources, and allocate 25% of the memory for the system. +For the hybrid deployment of TiKV and TiSpark, add TiSpark required resources to the TiKV reserved resources, and allocate 25% of the memory for the system. -## Deploy the TiDB Connector for Spark +## Deploy the TiSpark cluster -Download the TiDB Connector for Spark's jar package [here](http://download.pingcap.org/tispark-0.1.0-SNAPSHOT-jar-with-dependencies.jar). +Download TiSpark's jar package [here](http://download.pingcap.org/tispark-0.1.0-SNAPSHOT-jar-with-dependencies.jar). -### Deploy the TiDB Connector for Spark on the existing Spark cluster +### Deploy TiSpark on the existing Spark cluster -Running TiDB Connector for Spark on an existing Spark cluster does not require a reboot of the cluster. You can use Spark's `--jars` parameter to introduce the TiDB Connector for Spark as a dependency: +Running TiSpark on an existing Spark cluster does not require a reboot of the cluster. You can use Spark's `--jars` parameter to introduce TiSpark as a dependency: ```sh spark-shell --jars $PATH/tispark-0.1.0.jar ``` -If you want to deploy TiDB Connector for Spark as a default component, simply place the TiDB Connector for Spark jar package into the jars path for each node of the Spark cluster and restart the Spark cluster: +If you want to deploy TiSpark as a default component, simply place the TiSpark jar package into the jars path for each node of the Spark cluster and restart the Spark cluster: ```sh ${SPARK_INSTALL_PATH}/jars ``` -In this way, you can use either `Spark-Submit` or `Spark-Shell` to use the TiDB Connector for Spark directly. - - -### Deploy TiDB Connector for Spark without the Spark cluster +In this way, you can use either `Spark-Submit` or `Spark-Shell` to use TiSpark directly. +### Deploy TiSpark without the Spark cluster If you do not have a Spark cluster, we recommend using the standalone mode. To use the Spark Standalone model, you can simply place a compiled version of Spark on each node of the cluster. If you encounter problems, see its [official website](https://spark.apache.org/docs/latest/spark-standalone.html). And you are welcome to [file an issue](https://github.com/pingcap/tispark/issues/new) on our GitHub. - #### Download and install You can download [Apache Spark](https://spark.apache.org/downloads.html) -For the Standalone mode without Hadoop support, use Spark 2.1.x and any version of Pre-build with Apache Hadoop 2.x with Hadoop dependencies. If you need to use the Hadoop cluster, please choose the corresponding Hadoop version. You can also choose to build from the [source code](https://spark.apache.org/docs/2.1.0/building-spark.html) to match the previous version of the official Hadoop 2.6. Please note that the TiDB Connector for Spark currently only supports Spark 2.1.x version. - -Suppose you already have a Spark binaries, and the current PATH is `SPARKPATH`, please copy the TiDB Connector for Spark jar package to the `${SPARKPATH}/jars` directory. +For the Standalone mode without Hadoop support, use Spark 2.1.x and any version of Pre-build with Apache Hadoop 2.x with Hadoop dependencies. If you need to use the Hadoop cluster, please choose the corresponding Hadoop version. You can also choose to build from the [source code](https://spark.apache.org/docs/2.1.0/building-spark.html) to match the previous version of the official Hadoop 2.6. Please note that TiSpark currently only supports Spark 2.1.x version. + +Suppose you already have a Spark binaries, and the current PATH is `SPARKPATH`, please copy the TiSpark jar package to the `${SPARKPATH}/jars` directory. #### Start a Master node Execute the following command on the selected Spark Master node: - + ```sh cd $SPARKPATH -./sbin/start-master.sh +./sbin/start-master.sh ``` After the above step is completed, a log file will be printed on the screen. Check the log file to confirm whether the Spark-Master is started successfully. You can open the [http://spark-master-hostname:8080](http://spark-master-hostname:8080) to view the cluster information (if you does not change the Spark-Master default port number). When you start Spark-Slave, you can also use this panel to confirm whether the Slave is joined to the cluster. #### Start a Slave node - Similarly, you can start a Spark-Slave node with the following command: ```sh @@ -168,11 +163,9 @@ And stop it like below: ./sbin/stop-tithriftserver.sh ``` - ## Demo -Assuming that you have successfully started the TiDB Connector for Spark cluster as described above, here's a quick introduction to how to use Spark SQL for OLAP analysis. Here we use a table named `lineitem` in the `tpch` database as an example. - +Assuming that you have successfully started the TiSpark cluster as described above, here's a quick introduction to how to use Spark SQL for OLAP analysis. Here we use a table named `lineitem` in the `tpch` database as an example. Assuming that your PD node is located at `192.168.1.100`, port `2379`, add the following command to `$SPARK_HOME/conf/spark-defaults.conf`: @@ -250,8 +243,8 @@ TiSpark on PySpark is a Python package build to support the Python language with Q: What are the pros/cons of independent deployment as opposed to a shared resource with an existing Spark / Hadoop cluster? -A: You can use the existing Spark cluster without a separate deployment, but if the existing cluster is busy, TiDB Connector for Spark will not be able to achieve the desired speed. +A: You can use the existing Spark cluster without a separate deployment, but if the existing cluster is busy, TiSpark will not be able to achieve the desired speed. Q: Can I mix Spark with TiKV? -A: If TiDB and TiKV are overloaded and run critical online tasks, consider deploying the TiDB Connector for Spark separately. You also need to consider using different NICs to ensure that OLTP's network resources are not compromised and affect online business. If the online business requirements are not high or the loading is not large enough, you can consider mixing the TiDB Connector for Spark with TiKV deployment. +A: If TiDB and TiKV are overloaded and run critical online tasks, consider deploying TiSpark separately. You also need to consider using different NICs to ensure that OLTP's network resources are not compromised and affect online business. If the online business requirements are not high or the loading is not large enough, you can consider mixing TiSpark with TiKV deployment. \ No newline at end of file diff --git a/tools/loader.md b/tools/loader.md index 94e47c95a222a..a6ed18d416bdf 100644 --- a/tools/loader.md +++ b/tools/loader.md @@ -1,5 +1,6 @@ --- title: Loader Instructions +summary: Use Loader to load data to TiDB. category: advanced --- diff --git a/tools/mydumper.md b/tools/mydumper.md new file mode 100644 index 0000000000000..7364033254d23 --- /dev/null +++ b/tools/mydumper.md @@ -0,0 +1,47 @@ +--- +title: mydumper Instructions +summary: Use mydumper to export data from TiDB. +category: tools +--- + +# mydumper Instructions + +## What is mydumper? + +`mydumper` is a fork of the [mydumper](https://github.com/maxbube/mydumper) project with additional functionality specific to TiDB. It is the recommended method to use for logical backups of TiDB. + +[Download the Binary](http://download.pingcap.org/tidb-enterprise-tools-latest-linux-amd64.tar.gz). + +## What enhancements does this contain over regular mydumper? + ++ Uses `tidb_snapshot` to provide backup consistency instead of `FLUSH TABLES WITH READ LOCK` + ++ Includes the hidden `_tidb_rowid` column in `INSERT` statements when present + ++ Allows `tidb_snapshot` to be [configurable](../op-guide/history-read.md#how-tidb-reads-data-from-history-versions) (i.e. backup data as it appeared at an earlier point in time) + +### New parameter description + +``` + -z, --tidb-snapshot: Set the tidb_snapshot to be used for the backup. + Default: NOW()-INTERVAL 1 SECOND. + Accepts either a TSO or valid datetime. For example: -z "2016-10-08 16:45:26" +``` + +### Usage example + +Command line parameter: + +``` +./bin/mydumper -h 127.0.0.1 -u root -P 4000 +``` + +## FAQ + +### Is the source code for these changes available? + +Source code for PingCAP's mydumper is [available on GitHub](https://github.com/pingcap/mydumper). + +### Do you plan to make these changes available to upstream mydumper? + +Yes, we intend to make our changes available to upstream mydumper. See [PR #155](https://github.com/maxbube/mydumper/pull/155). diff --git a/tools/pd-control.md b/tools/pd-control.md index a3f60b0ffbd24..402d3ad572df4 100644 --- a/tools/pd-control.md +++ b/tools/pd-control.md @@ -1,5 +1,6 @@ --- title: PD Control User Guide +summary: Use PD Control to obtain the state information of a cluster and tune a cluster. category: tools --- @@ -9,8 +10,8 @@ As a command line tool of PD, PD Control obtains the state information of the cl ## Source code compiling -1. [Go](https://golang.org/) Version 1.7 or later -2. In the PD root directory, use the `make` command to compile and generate `bin/pd-ctl` +1. [Go](https://golang.org/) Version 1.9 or later +2. In the root directory of the [PD project](https://github.com/pingcap/pd), use the `make` command to compile and generate `bin/pd-ctl` > **Note:** Generally, you don't need to compile source code as the PD Control tool already exists in the released Binary or Docker. However, dev users can refer to the above instruction for compiling source code. @@ -86,22 +87,36 @@ Usage: } ``` -### `config [show | set \ \]` +### `config [show | set