Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiling tool add wholestagecodegen to execs mapping, sql to stage info and job end time #5868

Merged
merged 22 commits into from
Jun 21, 2022

Conversation

tgravescs
Copy link
Collaborator

fixes #5515

We add more information to the profiling tool for multiple things:

  1. add job start and end times
  2. Add the mapping of SQL queries to stage ids. This tables includes the stage duration and the mapping to specific SQL node names(node ids). This will help to quickly see longest stage and see associated nodes and sql query. Note this new table is specifically sorted by the stage time, which is different than all the others.. Note that not all exec nodes have mappings to stage ids so this isn't perfect
  3. Add a mapping of the wholestage code gen id to the actual execs contained in it. This is useful for timings and looking a the metrics to see what was included in each of the wholeStage code gen execs
  4. Add stage ids to the sql plan metrics table

These features were requested to be able to do some post processing on the profile logs to gather more insight into which execs are taking longest and what types of speedup between cpu and gpu runs. For this is needs to map wholestagecodegen into the node types and try to determine how much time each node is taking to compare to the gpu nodes.

New tables look like:

Job Information:
+--------+-----+---------+-----+-------------+-------------+
|appIndex|jobID|stageIds |sqlID|startTime    |endTime      |
+--------+-----+---------+-----+-------------+-------------+
|1       |0    |[0]      |null |1622846402778|1622846410240|
|1       |1    |[1,2,3,4]|0    |1622846431114|1622846441591|
+--------+-----+---------+-----+-------------+-------------+
SQL to Stage Information:
+--------+-----+-----+-------+--------------+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|appIndex|sqlID|jobID|stageId|stageAttemptId|Stage Duration|SQL Nodes(IDs)                                                                                                                                                     |
+--------+-----+-----+-------+--------------+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|1       |0    |1    |1      |0             |8174          |Exchange(9),WholeStageCodegen (1)(10),Scan(13)                                                                                                                     |
|1       |0    |1    |2      |0             |8154          |Exchange(16),WholeStageCodegen (3)(17),Scan(20)
SQL Plan Metrics for Application:
+--------+-----+------+---------------------+-------------+-------------------------+-----------+----------+--------+
|appIndex|sqlID|nodeID|nodeName             |accumulatorId|name                     |max_value  |metricType|stageIds|
+--------+-----+------+---------------------+-------------+-------------------------+-----------+----------+--------+
|1       |0    |0     |WholeStageCodegen (6)|91           |duration                 |3          |timing    |4       |
|1       |0    |1     |HashAggregate        |92           |number of output rows    |1          |sum       |4       |
|1       |0    |1     |HashAggregate        |95           |time in aggregation build|3          |timing    |4       |

WholeStageCodeGen Mapping:
+--------+-----+------+---------------------+-------------------+------------+
|appIndex|sqlID|nodeID|SQL Node             |Child Node         |Child NodeID|
+--------+-----+------+---------------------+-------------------+------------+
|1       |0    |0     |WholeStageCodegen (6)|HashAggregate      |1           |
|1       |0    |3     |WholeStageCodegen (5)|HashAggregate      |4           |
|1       |0    |3     |WholeStageCodegen (5)|Project            |5           |
|1       |0    |3     |WholeStageCodegen (5)|SortMergeJoin      |6           |
|1       |0    |7     |WholeStageCodegen (2)|Sort               |8           |

@tgravescs tgravescs added this to the Jun 6 - Jun 17 milestone Jun 17, 2022
@tgravescs tgravescs self-assigned this Jun 17, 2022
@tgravescs
Copy link
Collaborator Author

build

1 similar comment
@tgravescs
Copy link
Collaborator Author

build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Profiling tool add wholestagecodegen to execs mapping, sql to stage info and job end time
3 participants