[FEA] Let CPU handle Delta table's metadata related queries #5624

viadea · 2022-05-24T23:26:35Z

I wish we can just let CPU handle Delta table's metadata related queries.

The reason is there are some CPU fallbacks for Delta table's metadata queries such as the one reading _delta_log(Json files).
If the _delta_log is huge(say millions of rows), then the CPU fallback's performance penalty is not trivial.

If we can just let CPU handle those metadata queries, then at least the metadata queries' performance should be similar to CPU run.

andygrove · 2022-06-24T14:16:29Z

Here is an example of some of the expensive transitions in a delta lake metadata query.

andygrove · 2022-06-24T14:21:18Z

Setting spark.rapids.sql.optimizer.enabled=true removes some of the overhead here by avoiding moving to GPU for the final projection:

viadea · 2022-06-24T18:26:01Z

@andygrove I checked the query plan and time before and after setting spark.rapids.sql.optimizer.enabled=true and the result is similar -- 30s or so.
The plan are different though.

viadea added feature request New feature or request ? - Needs Triage Need team to review and classify labels May 24, 2022

sameerz removed the ? - Needs Triage Need team to review and classify label May 25, 2022

sameerz added the performance A performance related task/issue label May 25, 2022

andygrove self-assigned this Jun 15, 2022

andygrove added this to the Jun 6 - Jun 17 milestone Jun 15, 2022

sameerz modified the milestones: Jun 6 - Jun 17, Jun 20 - Jul 8 Jun 18, 2022

andygrove mentioned this issue Jun 24, 2022

Fall back to CPU for Delta Lake metadata queries [databricks] #5912

Merged

andygrove closed this as completed in #5912 Jul 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Let CPU handle Delta table's metadata related queries #5624

[FEA] Let CPU handle Delta table's metadata related queries #5624

viadea commented May 24, 2022

andygrove commented Jun 24, 2022

andygrove commented Jun 24, 2022

viadea commented Jun 24, 2022

[FEA] Let CPU handle Delta table's metadata related queries #5624

[FEA] Let CPU handle Delta table's metadata related queries #5624

Comments

viadea commented May 24, 2022

andygrove commented Jun 24, 2022

andygrove commented Jun 24, 2022

viadea commented Jun 24, 2022