-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
20x performance regression going from v6.5.2 to v6.5.3 on K8s #44715
Comments
@emchristiansen, could you please collect the necessary diagnostic data and upload them to clinic and put the Download URL here for us? It should include 2 time periods:
|
@emchristiansen do you use stale read in your case? |
Yes, I did use stale reads
…On Wed, Jun 21, 2023 at 4:35 AM Jinpeng Zhang ***@***.***> wrote:
@emchristiansen <https://github.com/emchristiansen> do you use stale read
in your case?
—
Reply to this email directly, view it on GitHub
<#44715 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABXDBCCADWBTORBS36HY7DXMKW53ANCNFSM6AAAAAAZIIX7AQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Will the latency of the cross-region be high here? How much is it compared with the local region? To resolve the above possible issue, the |
Can you checkout the TiDB / KV Request / Stale Read OPS panel in grafana? From the hit/miss count, you can calculate the stale read hit rate, usually change advance-ts-interval to half of your staleness will achieve good hit rate. BTW can you share the staleness of your workload with us? |
Bug Report
I'm using TiDB, installed on K8s using the v1.4.4 operator, without much customization (I basically followed the guides).
When I upgraded to v6.5.3 today I immediately noticed a 20x slowdown in my DB-heavy workloads.
Downgrading to v6.5.2 fixed the issue.
Peculiarities with my setup:
set global tidb_replica_read = 'closest-replicas';
.1. Minimal reproduce step (Required)
I don't have a minimal case.
2. What did you expect to see? (Required)
My particular workload should have a sustained throughput of ~7.5e3 QPS / region / worker, and when I upgraded to 6.5.3 it dropped to ~3e2.
The text was updated successfully, but these errors were encountered: