Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

statistics: ease the impact of stats feedback on cluster #15503

Merged
merged 8 commits into from
Jul 8, 2020

Conversation

eurekaka
Copy link
Contributor

@eurekaka eurekaka commented Mar 19, 2020

What problem does this PR solve?

Fix #17478

Problem Summary:

Statistics feedback would impose periodical read/write burden on the database. Each TiDB would dump the feedbacks collected on this instance into TiKV every 10 mins, and the stats owner TiDB instance would read the feedbacks dumped every 15 seconds. If the stats owner TiDB finds there are new feedbacks from TiKV, it would merge the feedbacks with statistics in cache, then dump all these updated statistics into TiKV. This dump operation is pretty heavy if there are bunches of feedbacks on bunches of columns/indexes, since it can be treated as a light-weight ANALYZE on a lot of tables.

What is changed and how it works?

What's Changed:

First, reduce the amount of feedbacks generated on each TiDB by:

  • decreasing default MaxQueryFeedbackCount;
  • discarding feedbacks which have too small error rate, or too small scanned row count;
  • discarding feedbacks which have overlapped ranges on the same index/column;

Second, merge multiple insert/update/delete statements of dumping statistics into single ones, to reduce the unnecessary function call stacks and RPCs.

How it Works:

Obviously, the first change can flow-control the statistics feedback mechanism fundamentally, but we may lose some stats accuracy incurred by feedback theoretically. The second change combines several small transactions into a big one, since we have controlled the amount of feedbacks using the first change, I guess this bigger transaction is supposed not to be a problem. However, the bigger transaction should have higher chances of write conflict, and it makes code harder to read, so I haven't made up my mind to keep it or not actually.

Related changes

  • Need to cherry-pick to the release branch: if this PR is experimentally effective, we should apply it in release branches.

Check List

Tests

Side effects

  • Performance regression
    • Possible stats accurateness lose may cause potential query performance regression.

Release note

  • Ease the impact of stats feedback on cluster

@eurekaka eurekaka added type/enhancement The issue or PR belongs to an enhancement. component/statistics status/WIP labels Mar 19, 2020
@github-actions github-actions bot added the sig/sql-infra SIG: SQL Infra label Mar 19, 2020
@eurekaka eurekaka removed the sig/sql-infra SIG: SQL Infra label Mar 19, 2020
@github-actions github-actions bot added the sig/sql-infra SIG: SQL Infra label Mar 19, 2020
@sre-bot
Copy link
Contributor

sre-bot commented May 11, 2020

@eurekaka
Copy link
Contributor Author

/run-build-image

@eurekaka
Copy link
Contributor Author

/run-build-image

@eurekaka
Copy link
Contributor Author

/run-build-image

@eurekaka
Copy link
Contributor Author

/run-build-image

1 similar comment
@eurekaka
Copy link
Contributor Author

/run-build-image

@eurekaka
Copy link
Contributor Author

/run-build-image

@eurekaka
Copy link
Contributor Author

/run-build-image

@eurekaka eurekaka marked this pull request as ready for review May 27, 2020 12:37
@eurekaka eurekaka requested review from a team as code owners May 27, 2020 12:37
@ghost ghost requested a review from winoros May 27, 2020 12:37
@eurekaka
Copy link
Contributor Author

/run-cherry-picker

ti-srebot pushed a commit to ti-srebot/tidb that referenced this pull request Jul 24, 2020
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
@ti-srebot
Copy link
Contributor

cherry pick to release-2.1 in PR #18769

@ti-srebot
Copy link
Contributor

cherry pick to release-3.0 in PR #18770

ti-srebot pushed a commit to ti-srebot/tidb that referenced this pull request Jul 24, 2020
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
@ti-srebot
Copy link
Contributor

cherry pick to release-4.0 in PR #18772

@ti-srebot
Copy link
Contributor

cherry pick to release-3.1 in PR #18773

eurekaka added a commit to ti-srebot/tidb that referenced this pull request Jul 24, 2020
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
eurekaka added a commit to ti-srebot/tidb that referenced this pull request Jul 24, 2020
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
ti-srebot added a commit that referenced this pull request Jul 27, 2020
ti-srebot added a commit that referenced this pull request Jul 27, 2020
eurekaka added a commit that referenced this pull request Jul 27, 2020
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>

Co-authored-by: Kenan Yao <cauchy1992@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/statistics epic/query-feedback-GA sig/execution SIG execution status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

avoid performance jitter caused by statistics feedback
8 participants