Skip to content

Commit

Permalink
Update use-tiflash.md from Special Week changes (#2972) (#3045)
Browse files Browse the repository at this point in the history
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
  • Loading branch information
ti-srebot authored Jun 28, 2020
1 parent 94ad717 commit 9f0aeda
Showing 1 changed file with 29 additions and 19 deletions.
48 changes: 29 additions & 19 deletions tiflash/use-tiflash.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,10 @@ You can either use TiDB to read TiFlash replicas for medium-scale analytical pro
- [Use TiDB to read TiFlash replicas](#use-tidb-to-read-tiflash-replicas)
- [Use TiSpark to read TiFlash replicas](#use-tispark-to-read-tiflash-replicas)

> **Note:**
>
> If you [use TiDB to read TiFlash replicas](#use-tidb-to-read-tiflash-replicas) in a transaction that contains any write operation (for example, `SELECT ... FOR UPDATE` followed by `UPDATE ...`), currently the behavior is undefined. This restriction will be removed in later versions.
## Create TiFlash replicas for tables

After TiFlash is connected to the TiKV cluster, data replication by default does not begin. You can send a DDL statement to TiDB through a MySQL client to create a TiFlash replica for a specific table:
Expand Down Expand Up @@ -59,8 +63,6 @@ ALTER TABLE `tpch50`.`lineitem` SET TIFLASH REPLICA 0

* It is recommended that you do not replicate more than 1,000 tables because this lowers the PD scheduling performance. This limit will be removed in later versions.

* TiFlash reserves the `system` database. You cannot create TiFlash replicas for the table in the database named `system` in TiDB. If you forcibly create such TiFlash replica, the result will be an undefined behavior (a temporary restriction).

## Check the replication progress

You can check the status of the TiFlash replicas of a specific table using the following statement. The table is specified using the `WHERE` clause. If you remove the `WHERE` clause, you will check the replica status of all tables.
Expand Down Expand Up @@ -119,11 +121,20 @@ explain analyze select count(*) from test.t;

`cop[tiflash]` means that the task will be sent to TiFlash for processing. If you have not selected a TiFlash replica, you can try to update the statistics using the `analyze table` statement, and then check the result using the `explain analyze` statement.

Note that if a table has only a single TiFlash replica and the related node cannot provide service, queries in the CBO mode will repeatedly retry. In this situation, you need to specify the engine or use the manual hint to read data from TiKV.
Note that if a table has only a single TiFlash replica and the related node cannot provide service, queries in the CBO mode will repeatedly retry. In this situation, you need to specify the engine or use the manual hint to read data from the TiKV replica.

### Engine isolation

Engine isolation is to specify that all queries use a replica of the specified engine by configuring the corresponding variable. The optional engines are `tikv` and `tiflash`, with the following two configuration levels:
Engine isolation is to specify that all queries use a replica of the specified engine by configuring the corresponding variable. The optional engines are "tikv", "tidb" (indicates the internal memory table area of TiDB, which stores some TiDB system tables and cannot be actively used by users), and "tiflash", with the following two configuration levels:

* TiDB instance-level, namely, INSTANCE level. Add the following configuration item in the TiDB configuration file:

```
[isolation-read]
engines = ["tikv", "tidb", "tiflash"]
```

**The INSTANCE-level default configuration is `["tikv", "tidb", "tiflash"]`.**

* SESSION level. Use the following statement to configure:

Expand All @@ -141,24 +152,19 @@ Engine isolation is to specify that all queries use a replica of the specified e
set SESSION tidb_isolation_read_engines = "engine list separated by commas";
```

The default configuration of the SESSION level inherits from TiDB configuration of the INSTANCE level.
The default configuration of the SESSION level inherits from the configuration of the TiDB INSTANCE level.

* TiDB instance-level, namely, INSTANCE level. This level overlaps with the SESSION level. For example, if you have configured "tikv, tiflash" in the SESSION level and "tikv" in the INSTANCE level, only TiKV is read.
The final engine configuration is the session-level configuration, that is, the session-level configuration overrides the instance-level configuration. For example, if you have configured "tikv" in the INSTANCE level and "tiflash" in the SESSION level, then the TiFlash replicas are read. If the final engine configuration is "tikv" and "tiflash", then the TiKV and TiFlash replicas are both read, and the optimizer automatically selects a better engine to execute.

Add the following configuration item in the TiDB configuration file:

```
[isolation-read]
engines = ["tikv", "tiflash"]
```

The INSTANCE-level default configuration is `["tikv", "tiflash"]`.
> **Note:**
>
> Because TiDB Dashboard and other components need to read some system tables stored in the TiDB memory table area, it is recommended to always add the "tidb" engine to the instance-level engine configuration.

When the engine is configured as "tikv, tiflash", it can read both TiKV and TiFlash replicas at the same time, and the optimizer automatically chooses to read which one. After the engine is specified, if the table in the query does not have a corresponding engine replica, an error is reported indicating that the table does not have the engine replica. Because the TiKV replica always exist, so the only situation is that the engine is configured as `tiflash` but the TiFlash replica does not exist.
If the queried table does not have a replica of the specified engine (for example, the engine is configured as "tiflash" but the table does not have a TiFlash replica), the query returns an error.

### Manual hint

Manual hint can force TiDB to use TiFlash replicas for specific table(s). The priority of manual hint is lower than that of engine isolation. If the engine specified in hint is not in the engine list, a warning is returned. Here is an example of using the manual hint:
Manual hint can force TiDB to use specified replicas for specific table(s) on the premise of satisfying engine isolation. Here is an example of using the manual hint:

{{< copyable "sql" >}}

Expand All @@ -174,14 +180,18 @@ If you set an alias to a table in a query statement, you must use the alias in t
select /*+ read_from_storage(tiflash[alias_a,alias_b]) */ ... from table_name_1 as alias_a, table_name_2 as alias_b where alias_a.column_1 = alias_b.column_2;
```

For hint syntax details, refer to [READ_FROM_STORAGE](/optimizer-hints.md#read_from_storagetiflasht1_name--tl_name--tikvt2_name--tl_name-).
In the above statements, `tiflash[]` prompts the optimizer to read the TiFlash replicas. You can also use `tikv[]` to prompt the optimizer to read the TiKV replicas as needed. For hint syntax details, refer to [READ_FROM_STORAGE](/optimizer-hints.md#read_from_storagetiflasht1_name--tl_name--tikvt2_name--tl_name-).

Engine isolation has higher priority over CBO and hint, and hint has higher priority over the cost estimation, which means that the cost estimation only selects the replica of the specified engine.
If the table specified by a hint does not have a replica of the specified engine, the hint is ignored and a warning is reported. In addition, a hint only takes effect on the premise of engine isolation. If the engine specified in a hint is not in the engine isolation list, the hint is also ignored and a warning is reported.

> **Note:**
>
> The MySQL client of 5.7.7 or earlier versions clears optimizer hints by default. To use the hint syntax in these early versions, start the client with the `--comments` option, for example, `mysql -h 127.0.0.1 -P 4000 -uroot --comments`.

### The relationship of smart selection, engine isolation, and manual hint

In the above three ways of reading TiFlash replicas, engine isolation specifies the overall range of available replicas of engines; within this range, manual hint provides statement-level and table-level engine selection that is more fine-grained; finally, CBO makes the decision and selects a replica of an engine based on cost estimation within the specified engine list.

## Use TiSpark to read TiFlash replicas

Currently, you can use TiSpark to read TiFlash replicas in a method similar to the engine isolation in TiDB. This method is to configure the `spark.tispark.use.tiflash` parameter to `true` (or `false`).
Expand All @@ -208,7 +218,7 @@ You can configure this parameter in either of the following ways:

> **Note:**
>
> The feature that enables TiFlash to support the new framework for collations in TiDB is in development. Currently, if you enable the [new framework for collations](/character-set-and-collation.md#new-framework-for-collations) in TiDB, none of the expressions can be pushed down.
> Before v4.0.2, TiDB does not support the new framework for collations, so in those previous versions, if you enable the [new framework for collations](/character-set-and-collation.md#new-framework-for-collations), none of the expressions can be pushed down. This restriction is removed in v4.0.2 and later versions.

TiFlash mainly supports predicate and aggregate push-down calculations. Push-down calculations can help TiDB perform distributed acceleration. Currently, table joins and `DISTINCT COUNT` are not the supported calculation types, which will be optimized in later versions.

Expand Down

0 comments on commit 9f0aeda

Please sign in to comment.