Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specialize single column primitive group values #7043

Merged
merged 8 commits into from
Jul 21, 2023

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Jul 20, 2023

Which issue does this PR close?

Part of #6969

Rationale for this change

--------------------
Benchmark tpch_mem.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃     main ┃ specialize-primitive-group-values ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 385.21ms │                          384.94ms │     no change │
│ QQuery 2     │ 102.04ms │                           91.22ms │ +1.12x faster │
│ QQuery 3     │ 104.50ms │                          108.36ms │     no change │
│ QQuery 4     │  69.37ms │                           72.72ms │     no change │
│ QQuery 5     │ 234.38ms │                          248.42ms │  1.06x slower │
│ QQuery 6     │  28.62ms │                           28.20ms │     no change │
│ QQuery 7     │ 550.32ms │                          577.73ms │     no change │
│ QQuery 8     │ 160.42ms │                          153.16ms │     no change │
│ QQuery 9     │ 348.10ms │                          350.41ms │     no change │
│ QQuery 10    │ 201.04ms │                          205.85ms │     no change │
│ QQuery 11    │  95.55ms │                          100.06ms │     no change │
│ QQuery 12    │ 112.20ms │                          112.50ms │     no change │
│ QQuery 13    │ 199.70ms │                          163.78ms │ +1.22x faster │
│ QQuery 14    │  30.56ms │                           31.35ms │     no change │
│ QQuery 15    │  34.63ms │                           30.74ms │ +1.13x faster │
│ QQuery 16    │ 104.27ms │                          107.38ms │     no change │
│ QQuery 17    │ 582.36ms │                          487.69ms │ +1.19x faster │
│ QQuery 18    │ 994.82ms │                          900.53ms │ +1.10x faster │
│ QQuery 19    │ 112.26ms │                          109.53ms │     no change │
│ QQuery 20    │ 204.93ms │                          212.06ms │     no change │
│ QQuery 21    │ 695.60ms │                          690.43ms │     no change │
│ QQuery 22    │  55.44ms │                           55.59ms │     no change │
└──────────────┴──────────┴───────────────────────────────────┴───────────────┘

The slower tests appear to just be noise, at least they don't seem to reproduce consistently

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label Jul 20, 2023
}

/// A [`GroupValues`] making use of [`Rows`]
struct GroupValuesRows {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is moved into a new module

let hash = key.hash(state);
let insert = self.map.find_or_find_insert_slot(
hash,
|g| unsafe { self.values.get_unchecked(*g).is_eq(key) },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is awesome 🚀

Ok(v) => *v.as_ref(),
Err(slot) => {
let g = self.values.len();
self.map.insert_in_slot(hash, slot, g);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should still track the allocated memory (like insert_accounted did?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is accounted in the size method

@alamb
Copy link
Contributor

alamb commented Jul 20, 2023

I just pushed a fix for the CI failure. I am now running benchmarks on this branch to confirm.

I expect this change may make a substantial difference on some of the ClickBench results as well, but I don't have them automated quite yet

@alamb alamb changed the title Specialize primitive group values Specialize single column primitive group values Jul 20, 2023
@alamb
Copy link
Contributor

alamb commented Jul 20, 2023

My benchmark results are similar. I plan to review this PR carefully tomorrow morning

FYI @yahoNanJing

--------------------
Benchmark tpch_mem.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃ main_base ┃ specialize-primitive-group-values ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  533.98ms │                          539.33ms │     no change │
│ QQuery 2     │  144.79ms │                          128.73ms │ +1.12x faster │
│ QQuery 3     │  154.18ms │                          151.19ms │     no change │
│ QQuery 4     │  108.38ms │                          113.34ms │     no change │
│ QQuery 5     │  339.26ms │                          345.13ms │     no change │
│ QQuery 6     │   37.56ms │                           38.62ms │     no change │
│ QQuery 7     │  810.72ms │                          762.61ms │ +1.06x faster │
│ QQuery 8     │  222.49ms │                          233.41ms │     no change │
│ QQuery 9     │  505.28ms │                          511.36ms │     no change │
│ QQuery 10    │  306.53ms │                          291.01ms │ +1.05x faster │
│ QQuery 11    │  149.93ms │                          146.22ms │     no change │
│ QQuery 12    │  165.08ms │                          160.02ms │     no change │
│ QQuery 13    │  284.97ms │                          241.23ms │ +1.18x faster │
│ QQuery 14    │   44.42ms │                           43.16ms │     no change │
│ QQuery 15    │   49.69ms │                           41.66ms │ +1.19x faster │
│ QQuery 16    │  155.59ms │                          150.90ms │     no change │
│ QQuery 17    │  752.18ms │                          723.37ms │     no change │
│ QQuery 18    │ 1420.88ms │                         1390.54ms │     no change │
│ QQuery 19    │  161.97ms │                          163.11ms │     no change │
│ QQuery 20    │  299.46ms │                          289.71ms │     no change │
│ QQuery 21    │ 1033.28ms │                          952.11ms │ +1.09x faster │
│ QQuery 22    │   82.75ms │                           83.04ms │     no change │
└──────────────┴───────────┴───────────────────────────────────┴───────────────┘

};
}

// TODO: More primitives
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still relevant?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @tustvold -- this PR is awesome ❤️ I will rerun the clickbench numbers with this PR once it is merged.

I left a bunch of "improve the comment" type suggestions which I think would add value but are optional and can be done as a follow on PR (I will do them if you choose not to).

I do think we should add hash collision testing prior to merge, which I left a comment about

/// Stores the group index based on the hash of its value
map: RawTable<usize>,
/// The group index of the null value if any
null_group: Option<usize>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}
EmitTo::First(n) => {
// SAFETY: self.map outlives iterator and is not modified concurrently
unsafe {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code is largely replicated from the row version. I wonder if it could be refactored into a (templated) common function (with appropriate documentation)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't an easy way to make this generic, as one stores tuples and one isn't... I at least can't see a way that doesn't just obfuscate the code

use hashbrown::raw::RawTable;
use std::sync::Arc;

/// A trait to allow hashing of floating point numbers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why this doesn't use create_hashes? And perhaps add comments in the code about the rationale?

If it is important not to use create_hashes I recommend

  1. Move this code to hash_utils.rs so it is easier to find
  2. Implement the force_hash_collisions version (that always hashes things to 0) to ensure appropriate coverage

Here is an example of force_hash_collisions
https://github.com/apache/arrow-datafusion/blob/368f6e606a3cfca8e04638b8d5ff0ff116a20b57/datafusion/physical-expr/src/hash_utils.rs#L214-L224

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't use create_hashes as we are generating the hashes from the native values, not an array

// specific language governing permissions and limitations
// under the License.

use crate::physical_plan::aggregates::group_values::GroupValues;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code is just moved, right?


hash_float!(f16, f32, f64);

/// A [`GroupValues`] storing raw primitive values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// A [`GroupValues`] storing raw primitive values
/// A [`GroupValues`] storing a single column of raw primitive values
///
/// This specialization is significantly faster than using the more general
/// purpose `Row`s format

@alamb
Copy link
Contributor

alamb commented Jul 21, 2023

FYI @yahoNanJing -- I think this PR will make it even easier to evaluate the improvements that a fixed width arrow row format would provide (We can make a specialized GroupsValue under the correct circumstances)

@alamb
Copy link
Contributor

alamb commented Jul 21, 2023

I saw some strange results when running with this branch on my clickbench testing. Please standby

@alamb
Copy link
Contributor

alamb commented Jul 21, 2023

TLDR is I think this PR is good to go from a performance perspective. I don't see massive gains but I do see small improvements

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants