20x performance regression going from v6.5.2 to v6.5.3 on K8s #44715

emchristiansen · 2023-06-15T19:05:20Z

Bug Report

I'm using TiDB, installed on K8s using the v1.4.4 operator, without much customization (I basically followed the guides).

When I upgraded to v6.5.3 today I immediately noticed a 20x slowdown in my DB-heavy workloads.
Downgrading to v6.5.2 fixed the issue.

Peculiarities with my setup:

I'm running on top of a Tailscale virtual network.
I created the K8s cluster using K0s, with Calico for networking.
I have one PD, KV, and DB per region and I force reads to be local by using set global tidb_replica_read = 'closest-replicas';.

1. Minimal reproduce step (Required)

I don't have a minimal case.

2. What did you expect to see? (Required)

My particular workload should have a sustained throughput of ~7.5e3 QPS / region / worker, and when I upgraded to 6.5.3 it dropped to ~3e2.

The text was updated successfully, but these errors were encountered:

Yui-Song · 2023-06-19T08:20:52Z

@emchristiansen, could you please collect the necessary diagnostic data and upload them to clinic and put the Download URL here for us? It should include 2 time periods:

Run your workload with v6.5.2
Run your workload with v6.5.3

zhangjinpeng87 · 2023-06-21T08:35:29Z

@emchristiansen do you use stale read in your case?

emchristiansen · 2023-06-21T13:30:29Z

Yes, I did use stale reads

…

On Wed, Jun 21, 2023 at 4:35 AM Jinpeng Zhang ***@***.***> wrote: @emchristiansen <https://github.com/emchristiansen> do you use stale read in your case? — Reply to this email directly, view it on GitHub <#44715 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABXDBCCADWBTORBS36HY7DXMKW53ANCNFSM6AAAAAAZIIX7AQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

cfzjywxk · 2023-06-27T01:59:37Z

@emchristiansen

I have one PD, KV, and DB per region

Will the latency of the cross-region be high here? How much is it compared with the local region?
One of the possible reasons is that after v6.5.3 the stale read would be retried on the leader directly if the dataIsNotReady error is returned to the tidb-server. It error would be met as the default value of advance-ts-interval is 20s, so almost all of the requests would be retried on the leaders.

To resolve the above possible issue, the advance-ts-interval need to be configured to a value smaller than your tidb_read_staleness to avoid the retry. For example, if the tidb_read_staleness is set to 5s, the advance-ts-interval need to be set to a smaller value like 2s or 1s.

you06 · 2023-06-27T03:48:18Z

@emchristiansen

Can you checkout the TiDB / KV Request / Stale Read OPS panel in grafana? From the hit/miss count, you can calculate the stale read hit rate, usually change advance-ts-interval to half of your staleness will achieve good hit rate.

BTW can you share the staleness of your workload with us?

emchristiansen added the type/bug The issue is confirmed as a bug. label Jun 15, 2023

seiya-annie added the sig/execution SIG execution label Jun 18, 2023

cfzjywxk added type/question The issue belongs to a question. and removed type/bug The issue is confirmed as a bug. sig/execution SIG execution labels Jun 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

20x performance regression going from v6.5.2 to v6.5.3 on K8s #44715

20x performance regression going from v6.5.2 to v6.5.3 on K8s #44715

emchristiansen commented Jun 15, 2023

Yui-Song commented Jun 19, 2023 •

edited

Loading

zhangjinpeng87 commented Jun 21, 2023

emchristiansen commented Jun 21, 2023 via email

cfzjywxk commented Jun 27, 2023 •

edited

Loading

you06 commented Jun 27, 2023

20x performance regression going from v6.5.2 to v6.5.3 on K8s #44715

20x performance regression going from v6.5.2 to v6.5.3 on K8s #44715

Comments

emchristiansen commented Jun 15, 2023

Bug Report

1. Minimal reproduce step (Required)

2. What did you expect to see? (Required)

Yui-Song commented Jun 19, 2023 • edited Loading

zhangjinpeng87 commented Jun 21, 2023

emchristiansen commented Jun 21, 2023 via email

cfzjywxk commented Jun 27, 2023 • edited Loading

you06 commented Jun 27, 2023

Yui-Song commented Jun 19, 2023 •

edited

Loading

cfzjywxk commented Jun 27, 2023 •

edited

Loading