Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split count_distinct.rs into separate modules #9087

Merged
merged 5 commits into from
Feb 1, 2024

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jan 31, 2024

Which issue does this PR close?

Part of #5472

Rationale for this change

This is something I noticed while working on #8849

The count distinct module was getting large so splitting it into smaller modules would makes things easier to navigate

What changes are included in this PR?

  1. Move code to native.rs module (to mirror strings.rs)
  2. rename native to NativeDistinctCountAccumulator to PrimitiveDistinctCountAccumulator to align with ArrowPrimitiveType
  3. add some more docs
  4. Remove an unecessary typedef

Are these changes tested?

By existing tests

Are there any user-facing changes?

No, this is internal code reorganization. None of this code is accessible outside of the crate

@github-actions github-actions bot added the physical-expr Physical Expressions label Jan 31, 2024
UInt16 => Box::new(NativeDistinctCountAccumulator::<UInt16Type>::new()),
UInt32 => Box::new(NativeDistinctCountAccumulator::<UInt32Type>::new()),
UInt64 => Box::new(NativeDistinctCountAccumulator::<UInt64Type>::new()),
Int8 => Box::new(PrimitiveDistinctCountAccumulator::<Int8Type>::new()),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed

#[derive(Debug)]
struct DistinctCountAccumulator {
values: HashSet<DistinctScalarValues, RandomState>,
values: HashSet<ScalarValue, RandomState>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was typedefd to

type DistinctScalarValues = ScalarValue;

Which serves no purpose that I can tell other than obscuring what this code is doing

@@ -260,182 +261,6 @@ impl Accumulator for DistinctCountAccumulator {
}
}

#[derive(Debug)]
struct NativeDistinctCountAccumulator<T>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to native.rs

@alamb alamb marked this pull request as ready for review January 31, 2024 17:48
@alamb alamb changed the title Split count_distinct.rs into separate modules @alamb Split count_distinct.rs into separate modules Jan 31, 2024
@alamb alamb requested a review from crepererum January 31, 2024 21:10
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here could be a module-level comment that quickly says what "native" is. "Native" I think revers to "primitive" scalars like ints and floats, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is correct. Good call

Added in 1edf4f0

@alamb
Copy link
Contributor Author

alamb commented Feb 1, 2024

Thanks again for the review @crepererum

@alamb alamb merged commit 8b50774 into apache:main Feb 1, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants