Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiling tool: Print UCX and GDS parameters #2785

Merged
merged 6 commits into from
Jun 30, 2021

Conversation

nartal1
Copy link
Collaborator

@nartal1 nartal1 commented Jun 23, 2021

This is a small PR where we print ucx and gds related paremeters in output file of profiling tool.
Added eventlog which has both ucx and gds parameters for unit testing.
This fixes #2693

+---------------------------------------------------------------+-----------------------------------------------------+
|spark.executorEnv.UCX_ERROR_SIGNALS                            |                                                     |
|spark.executorEnv.UCX_MAX_RNDV_RAILS                           |1                                                    |
|spark.executorEnv.UCX_MEMTYPE_CACHE                            |n                                                    |
|spark.executorEnv.UCX_RC_RX_QUEUE_LEN                          |1024                                                 |
|spark.executorEnv.UCX_RNDV_SCHEME                              |put_zcopy                                            |
|spark.executorEnv.UCX_TLS                                      |cuda_copy,cuda_ipc,rc,tcp                            |
|spark.executorEnv.UCX_UD_RX_QUEUE_LEN                          |1024                                                 |
|spark.rapids.cudfVersionOverride                               |true                                                 |
|spark.rapids.memory.gpu.direct.storage.spill.alignedIO         |false                                                |
|spark.rapids.memory.gpu.direct.storage.spill.alignmentThreshold|8m                                                   |
|spark.rapids.memory.gpu.direct.storage.spill.enabled           |true                                                 |
|spark.rapids.memory.gpu.direct.storage.spill.useHostMemory     |false                                                |
|spark.rapids.memory.gpu.unspill.enabled                        |false                                                |
|spark.rapids.memory.host.spillStorageSize                      |32G                                                  |
|spark.rapids.memory.pinnedPool.size                            |8G                                                   |
|spark.rapids.shuffle.maxMetadataSize                           |512K                                                 |
|spark.rapids.shuffle.transport.enabled                         |true                                                 |
|spark.rapids.shuffle.ucx.bounceBuffers.size                    |8M                                                   |
|spark.rapids.sql.batchSizeBytes                                |1g                                                   |
|spark.rapids.sql.concurrentGpuTasks                            |1                                                    |
|spark.shuffle.manager                                          |com.nvidia.spark.rapids.spark311.RapidsShuffleManager|
|spark.shuffle.service.enabled                                  |false                                                |
+---------------------------------------------------------------+-----------------------------------------------------+

Signed-off-by: Niranjan Artal <nartal@nvidia.com>
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
@nartal1 nartal1 added the feature request New feature or request label Jun 23, 2021
@nartal1 nartal1 added this to the June 21 - July 2 milestone Jun 23, 2021
@nartal1 nartal1 requested a review from tgravescs June 23, 2021 03:06
@nartal1 nartal1 self-assigned this Jun 23, 2021
fileWriter = None).collect()
assert(rows.length == 22) // 22 properties captured.
// verify ucx parameters are captured.
assert(rows(1)(0).equals("spark.executorEnv.UCX_MAX_RNDV_RAILS"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is order guaranteed here? Would be better to kit reference specific rows

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

order is guaranteed here. using order by propertyName in the query.

Copy link
Collaborator

@tgravescs tgravescs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test file is still pretty big here at 1mb, can we just add these to existing file or have very basic file that just starts with them set and exits, we don’t have to actually run with them

@@ -98,7 +98,7 @@ class CollectInformation(apps: Seq[ApplicationInfo],
val messageHeader = "\nSpark Rapids parameters set explicitly:\n"
for (app <- apps) {
if (app.allDataFrames.contains(s"propertiesDF_${app.index}")) {
app.runQuery(query = app.generateRapidsProperties + " order by key",
app.runQuery(query = app.generateRapidsUcxGdsProperties + " order by key",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use a more generic name here, such as generateNvidiaProperties ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@nartal1
Copy link
Collaborator Author

nartal1 commented Jun 24, 2021

build

@nartal1
Copy link
Collaborator Author

nartal1 commented Jun 24, 2021

@tgravescs I have updated the event log file. Now it's smaller. PTAL.
@andygrove I think I have addressed the review comments. PTAL.

@tgravescs tgravescs merged commit 95b29b0 into NVIDIA:branch-21.08 Jun 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Profiling Tool: Print GDS + UCX related parameters
4 participants