Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: improve NULL count estimation for single column index #9474

Merged
merged 2 commits into from
Mar 30, 2019

Conversation

eurekaka
Copy link
Contributor

@eurekaka eurekaka commented Feb 26, 2019

What problem does this PR solve?

An attempt to fix issue #8563.

When column or index contains a lot of null values, our row count estimation for conditions is quite inaccurate. This PR tries to improve the estimation for column stats or single-column index stats. Note that, in #8563, the query is on a multi-column index, and I haven't figured out a way to work in multi-column index scenarios under the current multi-column index framework.

What is changed and how it works?

  • exclude null values from histogram when building stats for single-column index;
  • add additional analyze distsql task to get null count of single-column index;
  • adjust logic of row count estimation for Histogram, Column and Index;
  • fix periodical panic of show stats_histograms, I will split this to a separate PR later; split to executor: only show valid columns in stats_histogram #9487

Check List

Tests

  • Unit test

Code changes

  • Has exported function/method change

Side effects

  • Increased code complexity

Related changes

N/A

@eurekaka eurekaka added type/enhancement The issue or PR belongs to an enhancement. sig/planner SIG: Planner component/statistics sig/execution SIG execution labels Feb 26, 2019
@codecov-io
Copy link

codecov-io commented Feb 27, 2019

Codecov Report

Merging #9474 into master will decrease coverage by 0.2822%.
The diff coverage is 73.3944%.

@@               Coverage Diff                @@
##             master      #9474        +/-   ##
================================================
- Coverage   77.5011%   77.2189%   -0.2823%     
================================================
  Files           404        405         +1     
  Lines         81591      81717       +126     
================================================
- Hits          63234      63101       -133     
- Misses        13664      13936       +272     
+ Partials       4693       4680        -13

@eurekaka
Copy link
Contributor Author

/run-all-tests

@eurekaka
Copy link
Contributor Author

/run-unit-test

@eurekaka
Copy link
Contributor Author

eurekaka commented Mar 7, 2019

/run-all-tests

1 similar comment
@eurekaka
Copy link
Contributor Author

eurekaka commented Mar 7, 2019

/run-all-tests

executor/analyze.go Outdated Show resolved Hide resolved
statistics/histogram.go Outdated Show resolved Hide resolved
@eurekaka eurekaka requested a review from alivxxx March 11, 2019 02:20
executor/analyze.go Outdated Show resolved Hide resolved
@eurekaka eurekaka force-pushed the null_estimation branch 2 times, most recently from 72fe923 to 8fbe7c8 Compare March 18, 2019 06:42
Copy link
Member

@winoros winoros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest lgtm

executor/analyze.go Outdated Show resolved Hide resolved
executor/analyze.go Outdated Show resolved Hide resolved
executor/analyze.go Outdated Show resolved Hide resolved
executor/analyze.go Show resolved Hide resolved
executor/analyze.go Outdated Show resolved Hide resolved
Copy link
Contributor

@alivxxx alivxxx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alivxxx alivxxx added the status/LGT1 Indicates that a PR has LGTM 1. label Mar 29, 2019
Copy link
Member

@zz-jason zz-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zz-jason zz-jason added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Mar 30, 2019
@zz-jason
Copy link
Member

/run-all-tests

@zz-jason zz-jason changed the title *: improve row count estimation for columns with NULL *: improve NULL count estimation for single column index Mar 30, 2019
@zz-jason zz-jason merged commit 2e14068 into pingcap:master Mar 30, 2019
@zz-jason
Copy link
Member

@eurekaka please cherry pick this PR to release-2.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/statistics sig/execution SIG execution sig/planner SIG: Planner status/LGT2 Indicates that a PR has LGTM 2. type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants