Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing Expand+Aggregate in sqls with many count distinct #10798

Merged
merged 6 commits into from
Sep 24, 2024

Conversation

binmahone
Copy link
Collaborator

@binmahone binmahone commented May 13, 2024

Fixing #10799. This PR tries to optimize the Expand&Aggregate exec in the first stage of a sql with many count distinct measures.

The optimizations in this PR include:

  1. Avoid allocating&initializing large number of null vectors when doing Expand
  2. Try coaleasce expanded column batches before sending them to Aggregate

@binmahone binmahone changed the title optimzing Expand+Aggregate in sqlw with many count distinct optimzing Expand+Aggregate in sqlw with many count distinct [WIP] May 13, 2024
@binmahone
Copy link
Collaborator Author

build

@winningsix winningsix changed the title optimzing Expand+Aggregate in sqlw with many count distinct [WIP] Optimzing Expand+Aggregate in sqlw with many count distinct [WIP] May 13, 2024
@binmahone binmahone changed the title Optimzing Expand+Aggregate in sqlw with many count distinct [WIP] Optimzing Expand+Aggregate in sqls with many count distinct [WIP] May 13, 2024
@sameerz sameerz added the performance A performance related task/issue label May 13, 2024
Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>
@binmahone
Copy link
Collaborator Author

build

@binmahone
Copy link
Collaborator Author

@revans2 @abellina @winningsix can you pls take a look of this PR ? we're going to pack a debug build based on this PR

case class NullVecKey(d: DataType, n: Int)

class NullVecCache(private val maxNulls: Int)
extends util.LinkedHashMap[NullVecKey, GpuColumnVector](100, 0.75f, true) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't understand why we are extending a map instead of wrapping it? Or even better using some other cache data structure built for this type of use case.

If we wrapped it, then we could get true LRU functionality and be able to reset the the priority on a read. It would let us not need to override remove so it throws. That API would just not exist.

// This is only for ExpandExec which will generate a lot of null vectors
case class NullVecKey(d: DataType, n: Int)

class NullVecCache(private val maxNulls: Int)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data stored in the cache needs to be spillable in some form. Eventually it would be nice to make it so instead of spilling we can just delete the value from the cache, but in the short term we need to make sure that everything stored in the cache is spillable.

It would also be really nice to have a timeout of some kind. If an entry is unused for a specific amount of time it should be deleted to avoid adding more memory pressure to the system.

@sameerz
Copy link
Collaborator

sameerz commented May 29, 2024

Please retarget to 24.08

@binmahone binmahone changed the base branch from branch-24.06 to branch-24.08 June 3, 2024 02:16
@binmahone
Copy link
Collaborator Author

build

@binmahone
Copy link
Collaborator Author

@wjxiz1992 query perf pass

@binmahone
Copy link
Collaborator Author

binmahone commented Jul 2, 2024

Hi @revans2 @abellina , since we're getting often-contradictory conclusions from customer side, we decide to hold on this PR until things are clearer. I'll turn back to address your comments once we're confident that these optimizations are always benificial.

@binmahone
Copy link
Collaborator Author

Hi @revans2 @abellina , since we're getting often-contradictory conclusions from customer side, we decide to hold on this PR until things are clearer. I'll turn back to address your comments once we're confident that these optimizations are always benificial.

@GaryShen2008 , I suggest to move this PR to 2410 because of the quoted reason

@sameerz
Copy link
Collaborator

sameerz commented Jul 29, 2024

Please retarget to the 24.10 branch.

@binmahone binmahone changed the base branch from branch-24.08 to branch-24.10 August 6, 2024 05:28
Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>
…ze_expand

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>
Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>
@binmahone
Copy link
Collaborator Author

Hi @revans2 I simplified the code to make it unnecessary to worry about the side effects of global caching for null vectors. The cache reuse ratio would be smaller than previous version, but it would suffice for our customer's use case (a query with a lot of count distincts). Please help to review again

Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few nits, but I think there are still some comments from @abellina that need to be addressed

val newColumns = boundExprs.safeMap(_.columnarEval(cb)).toArray[ColumnVector]
new ColumnarBatch(newColumns, cb.numRows())
try {
GpuExpressionsUtils.cachedNullVectors.get.clear()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are more ways than just this method to run a expression. I don't trust this to fix it every time. It is probably god enough in practice, but I don't like the precedence it is setting. At a minimum I want to see some comments here explaining what is happening and why. Preferably with a follow on issue to fix this once we have the ability to delete a buffer from the spill framework instead of spilling it.

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>
@binmahone binmahone changed the title Optimzing Expand+Aggregate in sqls with many count distinct [WIP] Optimzing Expand+Aggregate in sqls with many count distinct Sep 23, 2024
@abellina
Copy link
Collaborator

build

2 similar comments
@revans2
Copy link
Collaborator

revans2 commented Sep 23, 2024

build

@binmahone
Copy link
Collaborator Author

build

@binmahone binmahone merged commit 5725e2e into NVIDIA:branch-24.10 Sep 24, 2024
44 of 45 checks passed
@binmahone
Copy link
Collaborator Author

binmahone commented Sep 26, 2024

close #10799

@sameerz sameerz changed the title Optimzing Expand+Aggregate in sqls with many count distinct Optimizing Expand+Aggregate in sqls with many count distinct Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance A performance related task/issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants