Query execution tracing and replay tool #9668

xiaoxmeng · 2024-04-30T06:19:37Z

Description

Add query execution tracing and replay tool to facilitate query analysis. The tool shall allow us to replay a part of query execution on a local computer instead of replaying the whole query in a production environment or in a real Prestissimo cluster. The tool consists two parts:

(1) trace collection: run a query with trace collection enabled through query configs (and the corresponding session properties in Prestissimo context). The query execution will collect the trace by dumping the input vectors of a particular set of specified operators (data) and the corresponding query plan info (meta data) into a specified storage location;

(2) trace replay: constructs the a sub-query plan using the dumped query plan meta, and then load the dumped input vectors into memory and feed into the constructed sub-query plan for replay. If the input is too large, then we can build a special source operator to read the dumped input vector from storage in batches.

The replay can be done at different level: operator level, pipeline level and task level. We can start with the operator level and extend to pipeline and task level next.

cc @mbasmanova @duanmeng @huamn

mbasmanova · 2024-04-30T06:50:49Z

CC: @aditi-pandit

aditi-pandit · 2024-04-30T18:08:26Z

+1. This would be really useful.

FelixYBW · 2024-04-30T18:09:24Z

Similar as Gluten's microbenchmark reproduce tool. It will be super useful for debug and performance analysis. Good feature!

Summary: Add a query tracer to log the input data, and metadata (including query configurations, connector properties, and query plans). This logged data and metadata can be used to replay the operations of a specific operator or pipeline. Part of #9668 Pull Request resolved: #10774 Reviewed By: Yuhta Differential Revision: D61514971 Pulled By: xiaoxmeng fbshipit-source-id: 9a0b901ee1475a6c35169fe77eb19e797e31e210

Summary: Create a directory named `$QueryTraceBaseDir/$taskId` when a task is initiated, if query tracing is enabled. This directory will store metadata related to the task, including the query plan node tree, query configurations, and connector properties. Part of #9668 Pull Request resolved: #10815 Reviewed By: Yuhta Differential Revision: D61808438 Pulled By: xiaoxmeng fbshipit-source-id: 57eff8f4b70405ba5c60fcd8315b025b22c2317b

Summary: Create a directory named `$QueryTraceBaseDir/$taskId` when a task is initiated, if query tracing is enabled. This directory will store metadata related to the task, including the query plan node tree, query configurations, and connector properties. Part of facebookincubator#9668 Pull Request resolved: facebookincubator#10815 Reviewed By: Yuhta Differential Revision: D61808438 Pulled By: xiaoxmeng fbshipit-source-id: 57eff8f4b70405ba5c60fcd8315b025b22c2317b

xiaodouchen · 2024-09-13T04:23:36Z

This tool looks great, but I have two concerns:

Currently, the trace tool saves the full input data. If the data is too large, it may cause the node to crash in test or production environments.
There's no mechanism to delete trace data. If trace data is generated on multiple nodes, it needs to be manually deleted from each node.

I suggest adding a config query_trace_input_count to control the number of saved input data. When set to n (n>0), it means to save n inputs. When n is -1, it means to delete the trace file. Regarding deletion, since each query has a different taskId, it would be necessary to asynchronously delete the query_trace_dir during QueryTrace initialization. Moreover, for the same query_trace_dir, the generation and deletion of trace files should not occur concurrently to avoid potential conflicts or issues.

@duanmeng @xiaoxmeng What are your thoughts on this suggestion? I look forward to hearing your suggestions.

duanmeng · 2024-09-13T04:53:40Z

This tool looks great, but I have two concerns:

@xiaodouchen Thanks for your review.

Currently, the trace tool saves the full input data. If the data is too large, it may cause the node to crash in test or production environments.

The data can be logged on remote storage.
We can extend it later to support only trace from a few nodes or a few tasks.

There's no mechanism to delete trace data. If trace data is generated on multiple nodes, it needs to be manually deleted from each node.

Garbage collection is a different thing.

cc @xiaoxmeng

duanmeng · 2024-09-14T03:26:05Z

Add query trace writer and reader.
Trace metadata during task creation.
Add query replayer.
Query trace source and sink nodes/operators.
Support table writer replay.
Support query trace splits.
Support table scan replay.
Support aggregate replay.

Summary: Velox can record the query metadata (query plan and configs) during task creation and input vectors of the traced operator, see #10774 and #10815. This PR adds a query replayer, it can be used to replay a query locally using the metadata and input vectors from the production environment. It supports showing the summary of a query at present, and more traced operators' replaying supports will be added in the future. Also, this PR adds two query configs `query_trace_max_bytes` and `query_trace_task_reg_exp` to constraint the record input data size and trace tasks respectively to ensure the stability of the cluster in the prod. Part of #9668 Pull Request resolved: #10897 Reviewed By: tanjialiang Differential Revision: D62336733 Pulled By: xiaoxmeng fbshipit-source-id: d196738dfa92c29fe5de67a944f652a328903814

Summary: Create a `QueryDataWriter` in `exec::TableWriter` if the query trace is enabled, recording the input vectors batch by batch. Each operator writes its data to the directory `$rootDir/$pipelineId/$driverId/data`. The recorded data will be used to replay the execution of `exec::TableWriter`, which will be supported in the follow-up. Design notes: https://docs.google.com/document/d/1crIIeVz4tWKYQnBoHoxrv2i-4zAML9HSYLps8h5SDrc/edit#heading=h.y6j2ojtr7hm9 Part of #9668 Pull Request resolved: #10910 Reviewed By: pedroerp Differential Revision: D63444416 Pulled By: xiaoxmeng fbshipit-source-id: ddd74ff6dd56de7bce31ec536035b32211453364

Summary: Adds `TableWriterReplayer` to facilitate the replaying of `TableWriter` operator. Uses the given plan node ID to find the traced `TableWriteNode` from the traced plan. It helps create a new `TableWriterNode` and rebuild a query plan with a `QueryTraceScanNode`, then apply the traced configurations, and rerun. `QueryTraceScanNode` holds the traced data type and dir for a given plan node ID. These information can be utilized to build the `QueryTraceScan` operator. It creates a `QueryDataReader` using the traced data type and input data file. To find the right input data file for replaying, we need to use both the pipeline ID and driver ID, which are only known during operator creation, so we need to figure out the input traced data file and the output type dynamically. Part of #9668 Pull Request resolved: #11100 Reviewed By: tanjialiang Differential Revision: D63774083 Pulled By: xiaoxmeng fbshipit-source-id: 912bef3cb20d9b1a1685af625ba2f319e2dc7509

Summary: Records input in `HashAggregation` and AggregationReplayer to support the replaying. part of #9668 Pull Request resolved: #11176 Reviewed By: tanjialiang Differential Revision: D64017836 Pulled By: xiaoxmeng fbshipit-source-id: 4392e511fb889dfc232eaf64c7228655c50d623f

Summary: Add HiveConnectorSplit Serde, a prerequisite for supporting `TableScan` tracing. part of #9668 Pull Request resolved: #11184 Reviewed By: tanjialiang Differential Revision: D64019689 Pulled By: xiaoxmeng fbshipit-source-id: 22b89d8415d218f9bd3d4f7e589a0be276406fec

Summary: Add partitioned output trace replayer to facilitate debugging for partitioned output operator with complex input. part of facebookincubator#9668 Reviewed By: xiaoxmeng Differential Revision: D63959956 Pulled By: tanjialiang

Summary: Add partitioned output trace replayer to facilitate debugging for partitioned output operator with complex input. part of #9668 Pull Request resolved: #11175 Reviewed By: xiaoxmeng Differential Revision: D63959956 Pulled By: tanjialiang fbshipit-source-id: a1519cd1191222316ec03f7e5c219d03c5e6a5be

xiaoxmeng added the enhancement New feature or request label Apr 30, 2024

duanmeng self-assigned this Apr 30, 2024

duanmeng mentioned this issue Aug 18, 2024

Add Query Trace Writers and Readers #10774

Closed

duanmeng mentioned this issue Aug 25, 2024

Trace metadata during task creation #10815

Closed

This was referenced Aug 31, 2024

Add query replayer #10897

Closed

Add trace support for TableWriter #10910

Closed

xiaodouchen mentioned this issue Sep 18, 2024

Add Query Trace Deleter #11026

Open

duanmeng mentioned this issue Oct 3, 2024

Add TableWriterRepalyer #11100

Closed

This was referenced Oct 7, 2024

Add partitioned output trace replayer #11175

Closed

Add HashAggregation Replayer #11176

Closed

Add HiveConnectorSplit Serde #11184

Closed

duanmeng mentioned this issue Oct 9, 2024

Add QuerySplitTracer to records and rebuild splits #11205

Open

duanmeng mentioned this issue Oct 10, 2024

Ignore tracing of auxiliary operator #11220

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query execution tracing and replay tool #9668

Query execution tracing and replay tool #9668

xiaoxmeng commented Apr 30, 2024 •

edited by mbasmanova

Loading

mbasmanova commented Apr 30, 2024

aditi-pandit commented Apr 30, 2024

FelixYBW commented Apr 30, 2024

xiaodouchen commented Sep 13, 2024

duanmeng commented Sep 13, 2024

duanmeng commented Sep 14, 2024 •

edited

Loading

Query execution tracing and replay tool #9668

Query execution tracing and replay tool #9668

Comments

xiaoxmeng commented Apr 30, 2024 • edited by mbasmanova Loading

Description

mbasmanova commented Apr 30, 2024

aditi-pandit commented Apr 30, 2024

FelixYBW commented Apr 30, 2024

xiaodouchen commented Sep 13, 2024

duanmeng commented Sep 13, 2024

duanmeng commented Sep 14, 2024 • edited Loading

xiaoxmeng commented Apr 30, 2024 •

edited by mbasmanova

Loading

duanmeng commented Sep 14, 2024 •

edited

Loading