Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor PrimitiveGroupValueBuilder to use MaybeNullBufferBuilder #12623

Merged
merged 9 commits into from
Sep 30, 2024

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Sep 25, 2024

Which issue does this PR close?

Follow on to #12269 from @jayzhan211

Part of #12680

Rationale for this change

The initial implementation of the Column-wise group checking code (❤️ ) uses Vec<bool> to track null/validity but Arrow has a variety of optimized structures for doing so (BooleanBufferBuilder, etc)

I would like to get this code into great shape before we fill out support for other data types (like StringView and BinaryView)

I also felt that PrimitiveGroupValueBuilder could largely be a wrapper around PrimitiveBuilder in arrow which would avoid the code duplication as well as take advantage of Arrow's optimizations

Let's use BooleanBufferBuilder to simplify the null handling in PrimitiveGroupValueBuilder as a start.

What changes are included in this PR?

  1. Encapsulate the building of the boolean null buffer in MaybeNullBufferBuilder to track nulls
  2. Update PrimitiveGroupValueBuilder to use MaybeNullBufferBuilder

Planned follow ups:

  • Port the ByteGroupValueBuilder to use MaybeNullBufferBuilder
  • Try and optimize the take_n for the boolean builder

Are these changes tested?

By existing CI and manual benchmarks

Are there any user-facing changes?

No

Performance results show significant performance for TPCH (where there are mostly non null values)

--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃ main_base ┃ alamb_improve_boolean_handling ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  252.62ms │                       211.84ms │ +1.19x faster │
│ QQuery 2     │  131.31ms │                       123.35ms │ +1.06x faster │
│ QQuery 3     │  138.74ms │                       126.97ms │ +1.09x faster │
│ QQuery 4     │   92.29ms │                        88.82ms │     no change │
│ QQuery 5     │  180.88ms │                       160.01ms │ +1.13x faster │
│ QQuery 6     │   60.26ms │                        59.77ms │     no change │
│ QQuery 7     │  223.37ms │                       208.52ms │ +1.07x faster │
│ QQuery 8     │  165.38ms │                       172.63ms │     no change │
│ QQuery 9     │  259.49ms │                       240.81ms │ +1.08x faster │
│ QQuery 10    │  244.94ms │                       220.77ms │ +1.11x faster │
│ QQuery 11    │  104.23ms │                        90.69ms │ +1.15x faster │
│ QQuery 12    │  135.08ms │                       130.94ms │     no change │
│ QQuery 13    │  304.56ms │                       216.08ms │ +1.41x faster │
│ QQuery 14    │   97.08ms │                        69.77ms │ +1.39x faster │
│ QQuery 15    │  128.14ms │                       100.23ms │ +1.28x faster │
│ QQuery 16    │   85.69ms │                        81.34ms │ +1.05x faster │
│ QQuery 17    │  246.47ms │                       219.95ms │ +1.12x faster │
│ QQuery 18    │  344.79ms │                       307.37ms │ +1.12x faster │
│ QQuery 19    │  162.78ms │                       124.12ms │ +1.31x faster │
│ QQuery 20    │  145.63ms │                       135.01ms │ +1.08x faster │
│ QQuery 21    │  276.08ms │                       256.52ms │ +1.08x faster │
│ QQuery 22    │   65.63ms │                        65.88ms │     no change │
└──────────────┴───────────┴────────────────────────────────┴───────────────┘

Clickbench also appears to have gotten marginally faster (though it is right on the noise margin)

--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ alamb_improve_boolean_handling ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.28ms │                         2.23ms │     no change │
│ QQuery 1     │    40.39ms │                        40.64ms │     no change │
│ QQuery 2     │    96.32ms │                        98.32ms │     no change │
│ QQuery 3     │   102.14ms │                       101.14ms │     no change │
│ QQuery 4     │   927.03ms │                       924.21ms │     no change │
│ QQuery 5     │   942.38ms │                       960.34ms │     no change │
│ QQuery 6     │    34.60ms │                        34.06ms │     no change │
│ QQuery 7     │    45.85ms │                        44.43ms │     no change │
│ QQuery 8     │  1413.37ms │                      1328.62ms │ +1.06x faster │
│ QQuery 9     │  1369.64ms │                      1382.53ms │     no change │
│ QQuery 10    │   347.63ms │                       339.53ms │     no change │
│ QQuery 11    │   405.67ms │                       390.20ms │     no change │
│ QQuery 12    │  1090.61ms │                      1085.49ms │     no change │
│ QQuery 13    │  1612.01ms │                      1718.58ms │  1.07x slower │
│ QQuery 14    │  1244.87ms │                      1245.09ms │     no change │
│ QQuery 15    │  1081.28ms │                      1068.81ms │     no change │
│ QQuery 16    │  2563.57ms │                      2498.34ms │     no change │
│ QQuery 17    │  2354.45ms │                      2339.04ms │     no change │
│ QQuery 18    │  4990.47ms │                      4971.63ms │     no change │
│ QQuery 19    │    93.29ms │                        90.59ms │     no change │
│ QQuery 20    │  1723.58ms │                      1735.44ms │     no change │
│ QQuery 21    │  1986.68ms │                      2037.86ms │     no change │
│ QQuery 22    │  5204.60ms │                      5213.07ms │     no change │
│ QQuery 23    │ 10477.35ms │                     10570.85ms │     no change │
│ QQuery 24    │   593.91ms │                       585.68ms │     no change │
│ QQuery 25    │   489.43ms │                       497.24ms │     no change │
│ QQuery 26    │   659.59ms │                       658.08ms │     no change │
│ QQuery 27    │  2580.85ms │                      2594.42ms │     no change │
│ QQuery 28    │ 15817.30ms │                     15764.83ms │     no change │
│ QQuery 29    │   529.29ms │                       512.72ms │     no change │
│ QQuery 30    │  1078.06ms │                      1041.45ms │     no change │
│ QQuery 31    │  1102.82ms │                      1104.23ms │     no change │
│ QQuery 32    │  4283.70ms │                      4301.69ms │     no change │
│ QQuery 33    │  5154.16ms │                      5104.87ms │     no change │
│ QQuery 34    │  5071.89ms │                      5110.66ms │     no change │
│ QQuery 35    │  2105.38ms │                      1884.75ms │ +1.12x faster │
│ QQuery 36    │   268.51ms │                       264.62ms │     no change │
│ QQuery 37    │   122.03ms │                       119.07ms │     no change │
│ QQuery 38    │   143.31ms │                       142.64ms │     no change │
│ QQuery 39    │   763.00ms │                       764.96ms │     no change │
│ QQuery 40    │    57.22ms │                        59.47ms │     no change │
│ QQuery 41    │    51.43ms │                        48.44ms │ +1.06x faster │
│ QQuery 42    │    62.74ms │                        63.16ms │     no change │
└──────────────┴────────────┴────────────────────────────────┴───────────────┘
--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ alamb_improve_boolean_handling ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     0.67ms │                         0.67ms │     no change │
│ QQuery 1     │    72.82ms │                        73.77ms │     no change │
│ QQuery 2     │   131.07ms │                       129.55ms │     no change │
│ QQuery 3     │   132.31ms │                       134.45ms │     no change │
│ QQuery 4     │   972.74ms │                       987.26ms │     no change │
│ QQuery 5     │  1104.25ms │                      1097.30ms │     no change │
│ QQuery 6     │    65.67ms │                        66.37ms │     no change │
│ QQuery 7     │    81.87ms │                        83.45ms │     no change │
│ QQuery 8     │  1402.99ms │                      1385.74ms │     no change │
│ QQuery 9     │  1392.54ms │                      1427.63ms │     no change │
│ QQuery 10    │   462.30ms │                       441.91ms │     no change │
│ QQuery 11    │   508.61ms │                       504.67ms │     no change │
│ QQuery 12    │  1255.45ms │                      1267.01ms │     no change │
│ QQuery 13    │  2023.82ms │                      1909.52ms │ +1.06x faster │
│ QQuery 14    │  1389.35ms │                      1350.86ms │     no change │
│ QQuery 15    │  1138.28ms │                      1154.94ms │     no change │
│ QQuery 16    │  2609.64ms │                      2593.93ms │     no change │
│ QQuery 17    │  2446.64ms │                      2440.62ms │     no change │
│ QQuery 18    │  5050.54ms │                      5006.47ms │     no change │
│ QQuery 19    │   121.91ms │                       122.05ms │     no change │
│ QQuery 20    │  1634.43ms │                      1651.28ms │     no change │
│ QQuery 21    │  2082.30ms │                      2067.81ms │     no change │
│ QQuery 22    │  5053.39ms │                      5010.62ms │     no change │
│ QQuery 23    │ 12037.54ms │                     12037.03ms │     no change │
│ QQuery 24    │   799.01ms │                       779.69ms │     no change │
│ QQuery 25    │   712.87ms │                       709.63ms │     no change │
│ QQuery 26    │   861.62ms │                       858.29ms │     no change │
│ QQuery 27    │  2669.33ms │                      2672.52ms │     no change │
│ QQuery 28    │ 16408.36ms │                     16497.57ms │     no change │
│ QQuery 29    │   572.19ms │                       556.11ms │     no change │
│ QQuery 30    │  1239.32ms │                      1241.32ms │     no change │
│ QQuery 31    │  1336.85ms │                      1293.90ms │     no change │
│ QQuery 32    │  4336.67ms │                      4243.56ms │     no change │
│ QQuery 33    │  5335.19ms │                      5263.30ms │     no change │
│ QQuery 34    │  5240.27ms │                      5254.52ms │     no change │
│ QQuery 35    │  1885.80ms │                      1844.29ms │     no change │
│ QQuery 36    │   321.64ms │                       324.34ms │     no change │
│ QQuery 37    │   219.89ms │                       221.32ms │     no change │
│ QQuery 38    │   193.99ms │                       198.06ms │     no change │
│ QQuery 39    │   792.12ms │                       819.39ms │     no change │
│ QQuery 40    │    87.24ms │                        88.30ms │     no change │
│ QQuery 41    │    78.68ms │                        78.41ms │     no change │
│ QQuery 42    │    97.02ms │                        96.19ms │     no change │
└──────────────┴────────────┴────────────────────────────────┴───────────────┘

@github-actions github-actions bot added the physical-expr Physical Expressions label Sep 25, 2024
} else {
let elem = array.as_primitive::<T>().value(row);
self.group_values.push(elem);
self.nulls.push(true);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like this previous code manages self.nulls even when there are and have been no nulls at all in the input

@alamb
Copy link
Contributor Author

alamb commented Sep 25, 2024

Marking draft until I have benchmark numbers

@alamb
Copy link
Contributor Author

alamb commented Sep 26, 2024

Marking draft until I have benchmark numbers

The benchmark numbers look good -- now I just need to debug one test and I will put this up for review

@alamb alamb marked this pull request as ready for review September 28, 2024 16:02
has_null: bool,
nullable: bool,
/// Null state (when None, input is guaranteed not to have nulls)
nulls: Option<MaybeNullBufferBuilder>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I collapsed nullable and has_nulls into an Option and an Enum which I think makes the intent much clearer (and harder to mess up the nullable optimization

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My benchmark shows a slowdown compared to the main branch. I experienced a similar slowdown before due to the use of Option, and I suspect that might be the case again with this change

Copy link
Contributor Author

@alamb alamb Sep 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bencmarks seem to show a speedup, but I did not compare to just a bool. I'll make that change and rerun

@alamb
Copy link
Contributor Author

alamb commented Sep 29, 2024

I hacked up a version that checks the flag once per plan rather than once per batch with two separate column implementations in 36a2003. That way there is no has_null flag at all

Re-running benchmark to see if it makes a difference

@alamb
Copy link
Contributor Author

alamb commented Sep 30, 2024

Avoiding the null check at all seems to have made things slightly faster

--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃ main_base ┃ alamb_improve_boolean_handling ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  252.62ms │                       190.94ms │ +1.32x faster │
│ QQuery 2     │  131.31ms │                       117.97ms │ +1.11x faster │
│ QQuery 3     │  138.74ms │                       125.45ms │ +1.11x faster │
│ QQuery 4     │   92.29ms │                        86.13ms │ +1.07x faster │
│ QQuery 5     │  180.88ms │                       158.81ms │ +1.14x faster │
│ QQuery 6     │   60.26ms │                        41.35ms │ +1.46x faster │
│ QQuery 7     │  223.37ms │                       198.96ms │ +1.12x faster │
│ QQuery 8     │  165.38ms │                       152.53ms │ +1.08x faster │
│ QQuery 9     │  259.49ms │                       243.64ms │ +1.07x faster │
│ QQuery 10    │  244.94ms │                       237.13ms │     no change │
│ QQuery 11    │  104.23ms │                        89.80ms │ +1.16x faster │
│ QQuery 12    │  135.08ms │                       131.17ms │     no change │
│ QQuery 13    │  304.56ms │                       211.31ms │ +1.44x faster │
│ QQuery 14    │   97.08ms │                        73.39ms │ +1.32x faster │
│ QQuery 15    │  128.14ms │                       108.14ms │ +1.18x faster │
│ QQuery 16    │   85.69ms │                        78.46ms │ +1.09x faster │
│ QQuery 17    │  246.47ms │                       216.93ms │ +1.14x faster │
│ QQuery 18    │  344.79ms │                       320.72ms │ +1.08x faster │
│ QQuery 19    │  162.78ms │                       133.12ms │ +1.22x faster │
│ QQuery 20    │  145.63ms │                       125.68ms │ +1.16x faster │
│ QQuery 21    │  276.08ms │                       273.39ms │     no change │
│ QQuery 22    │   65.63ms │                        64.07ms │     no change │
└──────────────┴───────────┴────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                             ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (main_base)                        │ 3845.43ms │
│ Total Time (alamb_improve_boolean_handling)   │ 3379.07ms │
│ Average Time (main_base)                      │  174.79ms │
│ Average Time (alamb_improve_boolean_handling) │  153.59ms │
│ Queries Faster                                │        18 │
│ Queries Slower                                │         0 │
│ Queries with No Change                        │         4 │
└───────────────────────────────────────────────┴───────────┘

I think this PR is now ready for final review @jayzhan211

@alamb alamb changed the title Refactor PrimitiveGroupValueBuilder to use BooleanBuilder Refactor PrimitiveGroupValueBuilder to use MaybeNullBufferBuilder Sep 30, 2024
Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb
Copy link
Contributor Author

alamb commented Sep 30, 2024

Thank you @jayzhan211

@alamb alamb merged commit ddb4fac into apache:main Sep 30, 2024
24 checks passed
@alamb
Copy link
Contributor Author

alamb commented Sep 30, 2024

🚀

@alamb alamb deleted the alamb/improve_boolean_handling branch September 30, 2024 14:58
@Dandandan
Copy link
Contributor

Nice 🚀

@@ -62,62 +62,96 @@ pub trait GroupColumn: Send + Sync {
fn take_n(&mut self, n: usize) -> ArrayRef;
}

/// Stores a collection of primitive group values which are known to have no nulls
#[derive(Debug)]
pub struct NonNullPrimitiveGroupValueBuilder<T: ArrowPrimitiveType> {
Copy link
Contributor

@Dandandan Dandandan Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think NonNull can be a (boolean) generic parameter (const NonNull: bool) as well to avoid code duplication...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look into it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added as a subtask of #12680

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants