Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update task memory and disk spill metrics when buffer store spills #848

Merged
merged 2 commits into from
Sep 25, 2020

Conversation

jlowe
Copy link
Member

@jlowe jlowe commented Sep 24, 2020

This connects the buffer store spill to task metrics. When a spill occurs from the device store to the host memory store, it updates the memory spilled metric for the task corresponding to the current thread that triggered the spill. Similarly when the host store spills to the disk store, it updates the disk spilled metric.

This is far from perfect, as tasks can be blocked on the memory store lock while another task is spilling and not have anything reported yet be stalled/slow because of spilling. However it at least allows a user to see that host and disk spills are occurring in their queries.

Signed-off-by: Jason Lowe <jlowe@nvidia.com>
@jlowe jlowe added the feature request New feature or request label Sep 24, 2020
@jlowe jlowe added this to the Sep 14 - Sep 25 milestone Sep 24, 2020
@jlowe jlowe self-assigned this Sep 24, 2020
@jlowe
Copy link
Member Author

jlowe commented Sep 24, 2020

build

abellina
abellina previously approved these changes Sep 24, 2020
Copy link
Collaborator

@abellina abellina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had one comment to see if we wanted a log line from host => disk, LGTM though.

Signed-off-by: Jason Lowe <jlowe@nvidia.com>
@jlowe
Copy link
Member Author

jlowe commented Sep 24, 2020

build

@jlowe jlowe merged commit 4354e26 into NVIDIA:branch-0.3 Sep 25, 2020
sperlingxx pushed a commit to sperlingxx/spark-rapids that referenced this pull request Nov 20, 2020
…VIDIA#848)

* Update task memory and disk spill metrics when buffer store spills

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

* Add log for amount of bytes spilled from host store

Signed-off-by: Jason Lowe <jlowe@nvidia.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
…VIDIA#848)

* Update task memory and disk spill metrics when buffer store spills

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

* Add log for amount of bytes spilled from host store

Signed-off-by: Jason Lowe <jlowe@nvidia.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
…VIDIA#848)

* Update task memory and disk spill metrics when buffer store spills

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

* Add log for amount of bytes spilled from host store

Signed-off-by: Jason Lowe <jlowe@nvidia.com>
@jlowe jlowe deleted the task-spill-metrics branch September 10, 2021 15:41
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023
…IDIA#848)

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants