Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improve make_date performance #9112

Merged
merged 1 commit into from
Feb 5, 2024
Merged

Conversation

r3stl355
Copy link
Contributor

@r3stl355 r3stl355 commented Feb 2, 2024

Which issue does this PR close?

#9089. It may not close it but could be a stepping stone towards more optimal version.

Rationale for this change

The change shows significant bench performance improvement

  • Baseline/before
make_date_col_col_col_1000
                        time:   [12.829 µs 12.866 µs 12.907 µs]
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) low mild
  3 (3.00%) high mild
  5 (5.00%) high severe

make_date_scalar_col_col_1000
                        time:   [11.800 µs 11.830 µs 11.866 µs]
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  6 (6.00%) high mild
  4 (4.00%) high severe

make_date_scalar_scalar_col_1000
                        time:   [10.730 µs 10.755 µs 10.781 µs]
Found 10 outliers among 100 measurements (10.00%)
  9 (9.00%) high mild
  1 (1.00%) high severe

make_date_scalar_scalar_scalar
                        time:   [1.0802 µs 1.0936 µs 1.1060 µs]
  • After
make_date_col_col_col_1000
                        time:   [5.9822 µs 6.0034 µs 6.0267 µs]
                        change: [-53.494% -53.287% -53.100%] (p = 0.00 < 0.05)
                        Performance has improved.

make_date_scalar_col_col_1000
                        time:   [6.2536 µs 6.2686 µs 6.2840 µs]
                        change: [-47.052% -46.878% -46.700%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild

make_date_scalar_scalar_col_1000
                        time:   [6.5579 µs 6.5728 µs 6.5881 µs]
                        change: [-39.400% -39.096% -38.810%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

make_date_scalar_scalar_scalar
                        time:   [1.1023 µs 1.1104 µs 1.1181 µs]
                        change: [-0.3478% +0.9142% +2.2206%] (p = 0.16 > 0.05)
                        No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild

What changes are included in this PR?

Changed the way the value is looked up in the array - instead of looking up each value by converting to PrimitiveArray, pre-created the instance of PrimitiveArray and used that in the loop

Are these changes tested?

Yes, all tests and example runs are passing

Are there any user-facing changes?

No user and API changes, tried to match the behavior to what it was (including returning errors)

Signed-off-by: Nikolay Ulmasov <ulmasov@hotmail.com>
@github-actions github-actions bot added the physical-expr Physical Expressions label Feb 2, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really nice to me -- well done @r3stl355 . This is a really nice first PR

cc @Omega359 for your comments

@alamb
Copy link
Contributor

alamb commented Feb 2, 2024

I think this PR would be good to go once CI passes

@alamb alamb changed the title refactor: improve make_date benchmark feat: improve make_date performance Feb 2, 2024
@alamb
Copy link
Contributor

alamb commented Feb 2, 2024

I also ran the benchmarks:

git checkout a6ef1bec480872f15f83628a7fb8c9bb2722cd49 # merge-base

results are basically the same as yours:

alamb@aal-dev:~/arrow-datafusion3$ cargo bench --bench make_date
   Compiling datafusion-physical-expr v35.0.0 (/home/alamb/arrow-datafusion3/datafusion/physical-expr)
    Finished bench [optimized] target(s) in 2m 46s
     Running benches/make_date.rs (target/release/deps/make_date-4622d805a0644623)
Gnuplot not found, using plotters backend
make_date_col_col_col_1000
                        time:   [10.392 µs 10.399 µs 10.410 µs]
                        change: [-57.725% -57.673% -57.624%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

make_date_scalar_col_col_1000
                        time:   [11.036 µs 11.048 µs 11.060 µs]
                        change: [-48.876% -48.746% -48.575%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe

make_date_scalar_scalar_col_1000
                        time:   [11.386 µs 11.400 µs 11.415 µs]
                        change: [-37.743% -37.655% -37.562%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

make_date_scalar_scalar_scalar
                        time:   [1.2737 µs 1.2759 µs 1.2783 µs]
                        change: [+2.3366% +2.7260% +3.2257%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 17 outliers among 100 measurements (17.00%)
  2 (2.00%) low severe
  2 (2.00%) low mild
  4 (4.00%) high mild
  9 (9.00%) high severe

🚀

@r3stl355
Copy link
Contributor Author

r3stl355 commented Feb 2, 2024

Yes, I had some regression on all scalar test (make_date_scalar_scalar_scalar) but I managed to improve it with if/else - it's minimal now compared to improvements you get when at least one is an array

@Omega359
Copy link
Contributor

Omega359 commented Feb 3, 2024

Nice - that's a pretty big performance improvement!

@Omega359
Copy link
Contributor

Omega359 commented Feb 3, 2024

The

let array_size = if is_scalar { 1 } else { len.unwrap() };

variable could now be removed and the for loop could just use len.unwrap() I think.

@r3stl355
Copy link
Contributor Author

r3stl355 commented Feb 3, 2024

The

let array_size = if is_scalar { 1 } else { len.unwrap() };

variable could now be removed and the for loop could just use len.unwrap() I think.

Sure, I can do that but array_size is also used in the 3 calls to to_primitive_array. Alternatively, I could just move the variable into else as let array_size = len.unwrap();. What would be your preference @Omega359 ?

@Omega359
Copy link
Contributor

Omega359 commented Feb 3, 2024

Sure, I can do that but array_size is also used in the 3 calls to to_primitive_array. Alternatively, I could just move the variable into else as let array_size = len.unwrap();. What would be your preference @Omega359 ?

Hmm, I think I missed that - the downside of looking at code outside of an IDE without proper highlighting. Please ignore my comment then :)

I have to see if there is a way to look at and comment on these pull requests easily in rust rover. Hopefully they've brought that functionality over from IntelliJ IDEA.

@r3stl355
Copy link
Contributor Author

r3stl355 commented Feb 4, 2024

I've not tried RustRover, mainly work with VSCode. I don't have JetBrains licenses and though RR is free for now, I don't want to put time into learning something I won't be able to use once it becomes paid-for 🤷

@alamb
Copy link
Contributor

alamb commented Feb 4, 2024

I have to see if there is a way to look at and comment on these pull requests easily in rust rover. Hopefully they've brought that functionality over from IntelliJ IDEA.

I believe this feature is present (though I normally just use the github UI or else check out the PR directly via gh pr checkout <url>

@alamb
Copy link
Contributor

alamb commented Feb 4, 2024

I plan to merge this in tomorrow unless there are other comments

@alamb alamb merged commit afb5d5b into apache:main Feb 5, 2024
23 checks passed
@alamb
Copy link
Contributor

alamb commented Feb 5, 2024

Thanks again @r3stl355 and @Omega359

@r3stl355 r3stl355 deleted the issue-9089 branch February 5, 2024 14:34
@Omega359 Omega359 mentioned this pull request Mar 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants