Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-32268][SQL][FOLLOWUP] Add ColumnPruning in injectBloomFilter
### What changes were proposed in this pull request? Add `ColumnPruning` in `InjectRuntimeFilter.injectBloomFilter` to optimize the BoomFilter creation query. ### Why are the changes needed? It seems BloomFilter subqueries injected by `InjectRuntimeFilter` will read as many columns as filterCreationSidePlan. This does not match "Only scan the required columns" as the design said. We can check this by a simple case in `InjectRuntimeFilterSuite`: ```scala withSQLConf(SQLConf.RUNTIME_BLOOM_FILTER_ENABLED.key -> "true", SQLConf.RUNTIME_BLOOM_FILTER_APPLICATION_SIDE_SCAN_SIZE_THRESHOLD.key -> "3000", SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "2000") { val query = "select * from bf1 join bf2 on bf1.c1 = bf2.c2 where bf2.a2 = 62" sql(query).explain() } ``` The reason is subqueries have not been optimized by `ColumnPruning`, and this pr will fix it. ### Does this PR introduce _any_ user-facing change? No, not released ### How was this patch tested? Improve the test by adding `columnPruningTakesEffect` to check the optimizedPlan of bloom filter join. Closes apache#36047 from Flyangz/SPARK-32268-FOllOWUP. Authored-by: Yang Liu <yintai@xiaohongshu.com> Signed-off-by: Yuming Wang <yumwang@ebay.com>
- Loading branch information