Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plan: change the logic of converting to index join #7553

Merged
merged 10 commits into from
Sep 5, 2018

Conversation

winoros
Copy link
Member

@winoros winoros commented Aug 30, 2018

What problem does this PR solve?

Race occurs in race test. It's caused by the sql added in the unit-test.
Original sql is optimized to that one after #7276
Fix #7551

In such case, the index join may just be as well as hash join, not better than it. And even worse since it has redundant expression evaluation.
This pr mainly to make the hint work in this case.

What is changed and how it works?

Make the filters that index join uses to build range more strict.

Check List

Tests

  • Unit test

@winoros winoros added the sig/planner SIG: Planner label Aug 30, 2018
intSet := map[int]struct{}{}
for _, col := range s1 {
intSet[col.UniqueID] = struct{}{}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UniqueID is changed to int64 in #7385, should modify the map definition here.

Copy link
Contributor

@eurekaka eurekaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general

{
sql: "select /*+ TIDB_INLJ(t1) */ * from t t1 join t t2 where t1.a=t2.a and t1.a in(1, 2) and t2.a in (1, 2)",
best: "IndexJoin{TableReader(Table(t))->TableReader(Table(t)->Sel([in(t2.a, 1, 2)]))}(t1.a,t2.a)",
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this test sql would not trigger the code path of this patch, i.e, getIndexJoinByOuterIdx -> buildRangeForIndexJoin, because column a is handle, it would follow getIndexJoinByOuterIdx -> constructInnerTableScan. Should we change the column a in the sql to column b instead? or am I missing anything?

@winoros winoros changed the title plan: change the logic of converting to inner join plan: change the logic of converting to index join Aug 31, 2018

// ColumnSliceIntersect intersects two column slice.
// You need to make sure that at least each element in s2 is unique.
func ColumnSliceIntersect(s1, s2 []*Column) []*Column {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The caller only wants to know if there is any intersect, so just return bool is enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, changed.

@@ -527,7 +527,7 @@ func (p *LogicalJoin) buildRangeForIndexJoin(indexInfo *model.IndexInfo, innerPl
// After constant propagation, there won'be cases that t1.a=t2.a and t2.a=1 occur in the same time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to update this comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems it was not updated.

}

usableKeys := make([]*expression.Column, 0, len(keys))

// After predicate push down, the one side conditions of join must be the conditions that cannot be pushed down and
// cannot calculate range either. So we only need the innerPlan.pushedDownConds and the eq conditions that we generate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should be updated, no innerPlan now.

// Int datum 1 can convert to all column's type(numeric type, string type, json, time type, enum, set) safely.
fakeConstant := &expression.Constant{Value: types.NewIntDatum(1), RetType: key.GetType()}
eqFunc := expression.NewFunctionInternal(p.ctx, ast.EQ, types.NewFieldType(mysql.TypeTiny), key, fakeConstant)
conds = append(conds, eqFunc)
eqConds = append(eqConds, eqFunc)
}

conds = append(conds, innerPlan.pushedDownConds...)
return conds, eqConds, keyOff2IdxOff
remained = make([]expression.Expression, 0, len(innerFilters))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need a comment for this code block.

@@ -540,36 +540,48 @@ func (p *LogicalJoin) buildRangeForIndexJoin(indexInfo *model.IndexInfo, innerPl
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when will expression.Contains(accesses, eqCond) be false

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t1.a=t2.a and t1.c=t2.c, and index (t1.a, t1.b, t1.c). This case currently won't use index join.

// And if there're cases like t1.a=t2.a and t1.a > 1, we can also guarantee that t1.a > 1 won't be chosen as access condition.
// So DetachCondAndBuildRangeForIndex won't miss the equal conditions we generate.
ranges, accesses, remained, _, err := ranger.DetachCondAndBuildRangeForIndex(p.ctx, conds, idxCols, colLengths)
// In `buildFakeEqCondsForIndexJoin`, we construct the equal cond for join keys and remove filters that containing the join keys' column.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:
In buildFakeEqCondsForIndexJoin, we construct the equal conditions for join keys and remove filters that contain the join keys' column.
When t1.a = t2.a and t1.a > 1, we can also guarantee that t1.a > 1 won't be chosen as the access condition.
So the equal conditions we built can be successfully used to build a range if they can be used. They won't be affected by the existing filters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remained = append(remained, filter)
continue
}
conds = append(conds, filter)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this filter doesn't contain the column of join keys, can we just move all the filters in innerFilters to remained?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t1.a=t2.a and t1.b=1 index is (a, b).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it.

Copy link
Member

@zz-jason zz-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zz-jason
Copy link
Member

zz-jason commented Sep 5, 2018

/run-all-tests

@zz-jason zz-jason added type/enhancement The issue or PR belongs to an enhancement. status/LGT2 Indicates that a PR has LGTM 2. labels Sep 5, 2018
@zz-jason zz-jason merged commit 15e709c into pingcap:master Sep 5, 2018
@winoros winoros deleted the index-join branch September 5, 2018 07:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/planner SIG: Planner status/LGT2 Indicates that a PR has LGTM 2. type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants