Rework Profile tool to not require Spark to run and process files faster #3161

tgravescs · 2021-08-06T15:45:23Z

Redesign the profile tool to not require using Spark and SQL. Similar to the Qualification tool changes. It now run much faster and with less memory. we could probably improve memory further usage if we run into issues but I was able to run on file with all 105 nds queries without changing size.

It now takes 9.7 seconds to run collect mode on all 105 NDS query event logs. took a minute to profile all 105 NDS queries that were in a single event log (using default heap size 1G). Took 16.7 seconds to compare all 105 NDS queries using default heap size.
This is compared to spark-submit way took 10.5 minutes to run collect mode on 105 NDS queries and required much more memory, compare mode we suggested limited to 10 and removed that now.

This does change a few things:

runs with java instead of spark-submit/spark-shell
in collect mode you get 1 file per application where the file is -profile.log
comparison mode - the only differences were 2 tables to compare sql and job ids, everything else goes through normal path and application are just listed with different app index in the same table. Comparison file is now called rapids_4_spark_tools_compare.log
Added user to application info
all sections when empty just print a message to keep things consistent - it used to have some empty tables some messages
plan descriptions filed change to be consistent: {app.appId}-planDescriptions.log
Threadpool used to process applications. In compare mode threadpool used to read each application event log file and in collection mode threadpool reads file and processes each application separately.
I did change the way a bunch of things were stored in the application, some are now mutable to be more like spark history server and helps consolidate some things.
If event logs are failed to parse or have an error we simply skip them now instead of erroring out.
Since we were using spark to create formatted string, I created a class ProfileOutputWriter that borrowed much of the Spark showString functionality to keep the output very similar. In generally we create a *ProfileResult case class from functions and then that has functions for table headers and converts to Seq to be output. This should be able to extended to write to like CSV pretty easily.

All tests pass, I spot checked our other integration tests, I still need to do all of them and update the expected results for them. I will do that in parallel of this being reviewed.

Signed-off-by: Thomas Graves <tgraves@apache.org>

…nto profileRedesign

tgravescs · 2021-08-06T15:45:35Z

build

revans2

I did a quick pass through it, but I didn't look deeply enough to truly feel confident enough to approve it yet. My biggest concern are the methods like Analysis.jobAndStageMetricsAggregation that write things out, but also produce a result. It was odd before when it would print things out and update state, but now it is kind of a hybrid and it makes me a bit nervous it could get called too frequently.

tgravescs · 2021-08-06T18:12:38Z

almost all fo the functions do this and did it before, with the caveat that the writer has to be defined. The main reason they return things is just for testing purposes. We could separate out fetching it and writing it

revans2 · 2021-08-06T18:14:36Z

If it is just for testing then a comment or two about that would be fine. It just confused me.

revans2

I have had more time to go through it. I don't know if I got everything, but it does look good to me.

tgravescs · 2021-08-09T13:15:35Z

Merging this one and I am working on adding writing to CSV and I'll look at either commenting or splitting things into get functions and print functions in that PR.

tgravescs added 30 commits July 19, 2021 15:36

Start redesign profiling tool

97a31d6

Signed-off-by: Thomas Graves <tgraves@apache.org>

comment stuff out

fa25376

comment out

1eff270

Signed-off-by: Thomas Graves <tgraves@apache.org>

comment out tests

89bc2f1

print

1e42bcf

debug

f23bbac

fix print:

44b8c60

remove sparkSession usage

3363646

Signed-off-by: Thomas Graves <tgraves@apache.org>

test fixes

0051455

add printing

292e34a

add writer

96aa46a

add schema

26d2ae3

add appindex

8c73f93

error

cd68f48

add rapids jars output

aa4293c

add app index

3f399d4

print executor info

f79e96e

print job info

50552d7

remove extra

56461b1

sort

6ec4872

fix cast

5ead8c2

print data source info

5cc8095

add in properties

cc0e7ac

change to startswith

a302f85

show failed tasks

967adf6

fix substr

65b4820

add in failed stages and jobs

6719eeb

fix truncate

2725024

fix truncate

8ae3eaf

list removed executors

a2b26a1

tgravescs and others added 16 commits August 4, 2021 11:29

compare mode error

3882e42

more docs

04a27ad

Merge remote-tracking branch 'origin/branch-21.10' into profileRedesign

a4e85e7

fix merge issue

fbf8c3e

cleanup

4088eaf

fix test

f406eb2

Merge remote-tracking branch 'origin/branch-21.10' into profileRedesign

2c9ed5d

update test

862d886

change log file name

816319c

test fixes

6e981e5

Merge branch 'profileRedesign' of github.com:tgravescs/spark-rapids i…

928d51e

…nto profileRedesign

cleanup

d49204e

remove comment

f379ae8

fix schema

b52bd4a

make appIndex 1

c1c2803

comment app index

c67853c

tgravescs added the tools label Aug 6, 2021

tgravescs added this to the Aug 2 - Aug 13 milestone Aug 6, 2021

tgravescs self-assigned this Aug 6, 2021

revans2 reviewed Aug 6, 2021

View reviewed changes

revans2 approved these changes Aug 6, 2021

View reviewed changes

nartal1 approved these changes Aug 6, 2021

View reviewed changes

tgravescs merged commit f59de4f into NVIDIA:branch-21.10 Aug 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework Profile tool to not require Spark to run and process files faster #3161

Rework Profile tool to not require Spark to run and process files faster #3161

tgravescs commented Aug 6, 2021

tgravescs commented Aug 6, 2021

revans2 left a comment

tgravescs commented Aug 6, 2021

revans2 commented Aug 6, 2021

revans2 left a comment

tgravescs commented Aug 9, 2021

Rework Profile tool to not require Spark to run and process files faster #3161

Rework Profile tool to not require Spark to run and process files faster #3161

Conversation

tgravescs commented Aug 6, 2021

tgravescs commented Aug 6, 2021

revans2 left a comment

Choose a reason for hiding this comment

tgravescs commented Aug 6, 2021

revans2 commented Aug 6, 2021

revans2 left a comment

Choose a reason for hiding this comment

tgravescs commented Aug 9, 2021