Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translate tidb computing #3057

Merged
merged 63 commits into from
Jul 20, 2020
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
e5ce970
wip
baurine Jun 28, 2020
104a02f
wip
baurine Jun 28, 2020
1623274
wip
baurine Jun 28, 2020
93b3545
wip
baurine Jun 28, 2020
434a216
wip
baurine Jun 28, 2020
dfecc77
wip
baurine Jun 28, 2020
045a16e
Merge remote-tracking branch 'origin/master' into translate-tidb-comp…
baurine Jun 28, 2020
7248b0c
Update tidb-computing.md
TomShawn Jul 7, 2020
00dd7ec
Update tidb-computing.md
TomShawn Jul 8, 2020
c7530c7
Update tidb-computing.md
baurine Jul 13, 2020
2677941
Update tidb-computing.md
baurine Jul 13, 2020
ccfcba2
Update tidb-computing.md
baurine Jul 13, 2020
cdee879
Update tidb-computing.md
baurine Jul 13, 2020
d198973
Update tidb-computing.md
baurine Jul 13, 2020
49c19bd
Update tidb-computing.md
baurine Jul 13, 2020
81edc4b
Update tidb-computing.md
baurine Jul 13, 2020
2ffbf20
Update tidb-computing.md
baurine Jul 13, 2020
9973287
Update tidb-computing.md
baurine Jul 13, 2020
b8f47b8
Update tidb-computing.md
baurine Jul 13, 2020
8d4a809
Update tidb-computing.md
baurine Jul 13, 2020
025bd4d
Update tidb-computing.md
baurine Jul 13, 2020
997bb3d
Update tidb-computing.md
baurine Jul 13, 2020
4efd71f
Update tidb-computing.md
baurine Jul 13, 2020
b157b8b
Update tidb-computing.md
baurine Jul 13, 2020
36f51ec
Update tidb-computing.md
baurine Jul 13, 2020
e0a2c56
Update tidb-computing.md
baurine Jul 13, 2020
e3aaee5
Update tidb-computing.md
baurine Jul 13, 2020
cba591a
Update tidb-computing.md
baurine Jul 13, 2020
7166994
Update tidb-computing.md
baurine Jul 13, 2020
2d0265c
Update tidb-computing.md
baurine Jul 13, 2020
14cf3a1
Update tidb-computing.md
baurine Jul 13, 2020
cf0e8ca
Update tidb-computing.md
baurine Jul 13, 2020
110ae19
Update tidb-computing.md
baurine Jul 13, 2020
8d51c55
Update tidb-computing.md
baurine Jul 13, 2020
0fd26a6
Update tidb-computing.md
baurine Jul 13, 2020
2d3b7ef
Update tidb-computing.md
baurine Jul 13, 2020
d53875e
Update tidb-computing.md
baurine Jul 13, 2020
a23e75b
Update tidb-computing.md
baurine Jul 13, 2020
09dc373
Update tidb-computing.md
baurine Jul 13, 2020
d1b5cf6
Update tidb-computing.md
baurine Jul 13, 2020
cf95af0
Update tidb-computing.md
baurine Jul 13, 2020
60151ee
Update tidb-computing.md
baurine Jul 13, 2020
2ad9b24
Update tidb-computing.md
baurine Jul 13, 2020
8fdd93f
Update tidb-computing.md
baurine Jul 13, 2020
4fdab5d
Update tidb-computing.md
baurine Jul 13, 2020
7bb923e
Update tidb-computing.md
baurine Jul 13, 2020
e784112
Update tidb-computing.md
baurine Jul 13, 2020
0764443
Update tidb-computing.md
baurine Jul 13, 2020
59e9b56
Update tidb-computing.md
baurine Jul 13, 2020
2ce0c47
Update tidb-computing.md
baurine Jul 13, 2020
3a09bfd
Update tidb-computing.md
baurine Jul 13, 2020
d875807
Merge branch 'master' into translate-tidb-computing
baurine Jul 13, 2020
dc08906
Update tidb-computing.md
TomShawn Jul 13, 2020
3be0372
minor typo fixing
TomShawn Jul 13, 2020
3badc8e
Update TOC.md
TomShawn Jul 13, 2020
da1aaed
Update tidb-computing.md
TomShawn Jul 13, 2020
68fa091
Merge branch 'master' into translate-tidb-computing
baurine Jul 16, 2020
d40a05e
Merge branch 'master' into translate-tidb-computing
TomShawn Jul 16, 2020
d9524aa
Update TOC.md
TomShawn Jul 17, 2020
9ee1cac
Merge branch 'master' into translate-tidb-computing
TomShawn Jul 17, 2020
a858c41
Merge branch 'master' into translate-tidb-computing
baurine Jul 20, 2020
3e6ef48
Update tidb-computing.md
TomShawn Jul 20, 2020
7d3851a
Merge branch 'master' into translate-tidb-computing
TomShawn Jul 20, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added media/tidb-computing-dist-sql-flow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added media/tidb-computing-native-sql-flow.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added media/tidb-computing-tidb-sql-layer.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
158 changes: 158 additions & 0 deletions tidb-computing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
---
title: TiDB database computation
baurine marked this conversation as resolved.
Show resolved Hide resolved
summary: Understand the computation layer of the TiDB database.
baurine marked this conversation as resolved.
Show resolved Hide resolved
category: introduction
TomShawn marked this conversation as resolved.
Show resolved Hide resolved
---

# Computation of TiDB database
baurine marked this conversation as resolved.
Show resolved Hide resolved

Based on the distributed storage capability provided by TiKV, TiDB builds the computing engine that combines superior transaction processing with good data analysis capabilities. This article starts with a data mapping algorithm to describe how TiDB maps data from database tables to TiKV's (Key, Value) key-value pairs, then a description of how TiDB manages meta-information, and finally a description of the main architecture of the TiDB SQL layer.
baurine marked this conversation as resolved.
Show resolved Hide resolved

For the computation layer dependent storage schemes, this article only introduces TiKV based row storage structures. For analytic services, TiDB introduces a column storage scheme as a TiKV extension - [TiFlash](/tiflash/tiflash-overview.md).
baurine marked this conversation as resolved.
Show resolved Hide resolved

## Mapping of table data to Key-Value
baurine marked this conversation as resolved.
Show resolved Hide resolved

This section describes the scheme for mapping data to (Key, Value) key-value pairs in TiDB. Data here consists of the following two main aspects:
baurine marked this conversation as resolved.
Show resolved Hide resolved

- Data for each row in the table, hereinafter referred to as table data.
baurine marked this conversation as resolved.
Show resolved Hide resolved
- Data for all indexes in the table, hereinafter referred to as index data.
baurine marked this conversation as resolved.
Show resolved Hide resolved

### Mapping of table data to Key-Value

In a relational database, a table may have many columns. To map the data from each column in a row to a (Key, Value) key-value pair, you need to consider how to construct the Key. First of all, OLTP scenarios have a large number of operations such as adding, deleting, changing and searching for single or multiple rows, which require the database to read a line of data quickly. Therefore, it is best to have a unique ID (either explicit or implicit) for the corresponding key to facilitate a quick location. Second, many OLAP queries require a full table scan. If you can encode the keys of all rows in a table into a range, the whole table can be efficiently scanned by range queries.
baurine marked this conversation as resolved.
Show resolved Hide resolved

Based on the above considerations, the mapping of table data to Key-Value in TiDB is designed as follows:

- To ensure that data from the same table is kept together for easy searching, TiDB assigns a table ID to each table denoted by `TableID`. Table ID is an integer that is unique throughout the cluster.
- TiDB assigns a row ID, represented by `RowID`, to each row of data in the table. The row ID is also an integer, unique within the table. For row ID, TiDB has made a small optimization, if a table has integer type primary key, TiDB will use primary key value as the row ID of this row of data.
baurine marked this conversation as resolved.
Show resolved Hide resolved

Each row of data is encoded as a (Key, Value) key-value pair according to the following rule:

```
Key: tablePrefix{TableID}_recordPrefixSep{RowID}
Value: [col1, col2, col3, col4]
```

`tablePrefix` and `recordPrefixSep` are both special string constants used to distinguish other data in Key space. The exact value is given in the summary that follows.
baurine marked this conversation as resolved.
Show resolved Hide resolved

### Mapping of Indexed Data to Key-Value

TiDB supports both primary keys and secondary indexes (both unique and non-unique indexes). Similar to the table data mapping scheme, TiDB assigns an index ID to each index in the table indicated by `IndexID`.
baurine marked this conversation as resolved.
Show resolved Hide resolved

For primary keys and unique indexes, it is needed to quickly locate the corresponding RowID based on the key value, so it is encoded as follows (Key, Value) Key-value pairs.
baurine marked this conversation as resolved.
Show resolved Hide resolved

```
Key: tablePrefix{tableID}_indexPrefixSep{indexID}_indexedColumnsValue
Value: RowID
```

For ordinary secondary indexes that do not need to satisfy the uniqueness constraint, a single key may correspond to multiple rows. It needs to query corresponding RowID according to the range of keys. Therefore, it is encoded as a (Key, Value) key-value pair according to the following rule:
baurine marked this conversation as resolved.
Show resolved Hide resolved

```
Key: tablePrefix{TableID}_indexPrefixSep{IndexID}_indexedColumnsValue_{RowID}
Value: null
```

### Summary of mapping relationships

`tablePrefix`, `recordPrefixSep`, and `indexPrefixSep` in all of the above encoding rules are string constants that are used to distinguish between other data in Key space, defined as follows:
baurine marked this conversation as resolved.
Show resolved Hide resolved

```
tablePrefix = []byte{'t'}
recordPrefixSep = []byte{'r'}
indexPrefixSep = []byte{'i'}
```

Also note that in the above schemes, regardless of table data or index data key encoding scheme, all rows in a table have the same key prefix, and all the data of an index also has the same prefix. Data with the same prefixes are thus arranged together in TiKV's Key space. Therefore, by carefully designing the encoding scheme of the suffix part to ensure that the pre and post-encoding comparisons remain the same, the table data or index data can be stored in the TiKV in an ordered manner. With this encoding, all rows of data in a table are arranged orderly by `RowID` in the TiKV's Key space, and the data for a particular index will also be arranged sequentially in the Key space according to the specific value of the index data (the `indexedColumnsValue`).
baurine marked this conversation as resolved.
Show resolved Hide resolved

### Example of Key-Value mapping relationship

Finally, a simple example is used to understand the Key-Value mapping relationship of TiDB. Suppose the following table exists in TiDB.
baurine marked this conversation as resolved.
Show resolved Hide resolved

```sql
CREATE TABLE User {
ID int,
Name varchar(20),
Role varchar(20),
Age int,
PRIMARY KEY (ID),
KEY idxAge (Age)
};
```

Suppose there are 3 rows of data in the table.

```
1, "TiDB", "SQL Layer", 10
2, "TiKV", "KV Engine", 20
3, "PD", "Manager", 30
```

First, each row of data is mapped to a (Key, Value) key-value pair, and the table has an `int` type primary key, so the value of `RowID` is the value of this primary key. Suppose the table has `TableID` of 10, then its table data stored on TiKV is:
baurine marked this conversation as resolved.
Show resolved Hide resolved

```
t10_r1 --> ["TiDB", "SQL Layer", 10]
t10_r2 --> ["TiKV", "KV Engine", 20]
t10_r3 --> ["PD", " Manager", 30]
```

In addition to the primary key, the table has a non-unique ordinary secondary index, `idxAge`. Suppose the `IndexID` is 1, then its index data stored on TiKV is:
baurine marked this conversation as resolved.
Show resolved Hide resolved

```
t10_i1_10_1 --> null
t10_i1_20_2 --> null
t10_i1_30_3 --> null
```

The above example shows the mapping rule from a relational model to a Key-Value model in TiDB, and the consideration behind the selection of this scheme.
baurine marked this conversation as resolved.
Show resolved Hide resolved

## Meta-information management
baurine marked this conversation as resolved.
Show resolved Hide resolved

Each `Database` and `Table` in TiDB has meta information, aka its definition and various attributes. This information also needs to be persistent, and TiDB stores this information in TiKV as well.
baurine marked this conversation as resolved.
Show resolved Hide resolved

Each `Database` / `Table` is assigned an unique ID. As the unique identifier, when encoded as Key-Value, this ID is encoded in the Key with the `m_` prefix. This constructs a Key, and stores the serialized meta information in Value.
baurine marked this conversation as resolved.
Show resolved Hide resolved

Besides, TiDB also uses a dedicated (Key, Value) key pair to store the latest version number of the current all tables structure information. This key-value pair is global, and its version number is increased by 1 each time the state of the DDL operation changes. TiDB stores this key-value pair persistently in the PD Server with a key of "/tidb/ddl/global_schema_version", and Value is the version number value of int64 type. Refers to Google F1's Online Schema change algorithm, TiDB keeps a background thread constantly checking whether the version number of the table structure information stored in the PD Server changes, and ensuring to get the changes of version in a certain time.
baurine marked this conversation as resolved.
Show resolved Hide resolved

## Introduction to the SQL layer

TiDB's SQL layer, TiDB Server, is responsible for translating the SQL into Key-Value operation to the common distributed Key-Value storage layer TiKV, assembling the results returned by TiKV, and returning the query results to the client ultimately.
baurine marked this conversation as resolved.
Show resolved Hide resolved

The nodes at this layer are stateless, the nodes themselves do not store data, and the nodes are completely equal.
baurine marked this conversation as resolved.
Show resolved Hide resolved

### SQL algorithm
baurine marked this conversation as resolved.
Show resolved Hide resolved

The simplest solution is through the [mapping of table data to Key-Value](#mapping-of-table-data-to-key-value) as described in the previous section scheme, mapping SQL queries to KV queries, and then acquires the corresponding data through the KV interface and performs various computations.
baurine marked this conversation as resolved.
Show resolved Hide resolved

For example, `select count(*) from user where name = "TiDB"` such a SQL statement. It needs to read all the data in the table, then check if the `name` field is `TiDB`, and if so, returns this line. The process is as follows:
baurine marked this conversation as resolved.
Show resolved Hide resolved

1. construct the Key Range: all `RowID` in a table are in `[0, MaxInt64)` range. According to the row data `Key` encoding rule, using `0` and `MaxInt64` can construct a `[StartKey, EndKey)` range that is left-included, right-excluded.
baurine marked this conversation as resolved.
Show resolved Hide resolved
2. scan Key Range: read the data in TiKV according to the key range constructed above.
baurine marked this conversation as resolved.
Show resolved Hide resolved
3. filter data: for each row of data read, calculate `name = "TiDB"` expression. Returns up this line if true, otherwise discards this line of data.
baurine marked this conversation as resolved.
Show resolved Hide resolved
4. calculate `Count(*)`: for each line that meets the requirements, accumulate to the result of `Count(*)`.
baurine marked this conversation as resolved.
Show resolved Hide resolved

**The entire process is illustrated as follows:**

![naive sql flow](/media/tidb-computing-native-sql-flow.jpeg)

This solution is intuitive and feasible, but has some obvious problems in a distributed database scenario.

- As the data is being scanned, each row is read out of TiKV via a KV operation at least once RPC overhead, which can be very high if there is a lot of data to scan.
baurine marked this conversation as resolved.
Show resolved Hide resolved
- Not all rows meet the filter criteria `name = "TiDB"`. If the conditions are not met, they are unnessary to be read out.
baurine marked this conversation as resolved.
Show resolved Hide resolved
- The value of the rows that meet the requirements doesn't mean anything, in fact, all needed here is the information of how many rows of data.
baurine marked this conversation as resolved.
Show resolved Hide resolved

### Distributed SQL operations

To solve the above problem, the computation should need to be as close to the storage node as possible to avoid a large number of RPC calls. First of all, the SQL predicate condition `name = "TiDB"` should be pushed down to the storage node for computation, so that only valid rows are returned, avoiding meaningless network transfers. Then, the aggregation function `Count(*)` can also be pushed down to the storage nodes for pre-aggregation, and each node only has to return a result of `Count(*)`, and the SQL layer will sum up the `Count(*)` result returned by each node.
baurine marked this conversation as resolved.
Show resolved Hide resolved

The following is a schematic representation of the data returned layer by layer.
baurine marked this conversation as resolved.
Show resolved Hide resolved

![dist sql flow](/media/tidb-computing-dist-sql-flow.png)

### SQL layer architecture
baurine marked this conversation as resolved.
Show resolved Hide resolved

With the above example, I hope you have a basic understanding of how SQL statements are handled. In fact, TiDB's SQL layer is much more complex, with many modules and layers. The following diagram lists the important modules and calling relationships:
baurine marked this conversation as resolved.
Show resolved Hide resolved

![tidb sql layer](/media/tidb-computing-tidb-sql-layer.png)

The user's SQL request is sent to TiDB Server either directly or via `Load Balancer`. TiDB Server will parse `MySQL Protocol Packet`, getting the content of requests, parsing syntax and semantic analysis of SQL, developing and optimizing query plans, executing a query plan , fetching and processing the data. The data is all stored in the TiKV cluster, so in this process, TiDB Server needs to interact with the TiKV and gets the data. Finally, TiDB Server needs to return the query results to the user.
baurine marked this conversation as resolved.
Show resolved Hide resolved