Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "Extended" clickbench queries #8861

Merged
merged 1 commit into from
Jan 16, 2024

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jan 14, 2024

Which issue does this PR close?

Closes #8860

Rationale for this change

I would like to have benchmarks that allow us to show improvements such as #8827 and #8849 are significant

What changes are included in this PR?

Add new "Extended" datafusion specific clickbench queries:

to run:

./benchmarks/bench.sh run clickbench_extended

Example:

***************************
DataFusion Benchmark Script
COMMAND: run
BENCHMARK: clickbench_extended
DATAFUSION_DIR: /Users/andrewlamb/Software/arrow-datafusion/benchmarks/..
BRACH_NAME: alamb_clickbench_extended
DATA_DIR: /Users/andrewlamb/Software/arrow-datafusion/benchmarks/data
RESULTS_DIR: /Users/andrewlamb/Software/arrow-datafusion/benchmarks/results/alamb_clickbench_extended
CARGO_COMMAND: cargo run --profile release-nonlto
***************************
RESULTS_FILE: /Users/andrewlamb/Software/arrow-datafusion/benchmarks/results/alamb_clickbench_extended/clickbench_extended.json
Running clickbench (1 file) extended benchmark...
   Compiling datafusion-benchmarks v34.0.0 (/Users/andrewlamb/Software/arrow-datafusion/benchmarks)
     Running `/Users/andrewlamb/Software/arrow-datafusion/target/release-nonlto/dfbench clickbench --iterations 5 --path /Users/andrewlamb/Software/ar
row-datafusion/benchmarks/data/hits.parquet --queries-path /Users/andrewlamb/Software/arrow-datafusion/benchmarks/queries/clickbench/extended.sql -o /Users/andrewlamb/Software/arrow-datafusion/benchmarks/results/alamb_clickbench_extended/clickbench_extended.json`
Running benchmarks with the following options: RunOpt { query: None, common: CommonOpt { iterations: 5, partitions: None, batch_size: 8192, debug: false }, path: "/Users/andrewlamb/Software/arrow-datafusion/benchmarks/data/hits.parquet", queries_path: "/Users/andrewlamb/Software/arrow-datafusion/benchmarks/queries/clickbench/extended.sql", output_path: Some("/Users/andrewlamb/Software/arrow-datafusion/benchmarks/results/alamb_clickbench_extended/clickbench_extended.json") }
Q0: SELECT COUNT(DISTINCT "SearchPhrase"), COUNT(DISTINCT "MobilePhone"), COUNT(DISTINCT "MobilePhoneModel") FROM hits;
Query 0 iteration 0 took 5614.0 ms and returned 1 rows
Query 0 iteration 1 took 5652.6 ms and returned 1 rows
Query 0 iteration 2 took 5554.3 ms and returned 1 rows
Query 0 iteration 3 took 5511.4 ms and returned 1 rows
Query 0 iteration 4 took 5554.3 ms and returned 1 rows
Done

Are these changes tested?

I tested this (and clickbench_1) manually

Are there any user-facing changes?

this is a development tool only

@alamb alamb added the development-process Related to development process of DataFusion label Jan 14, 2024
@github-actions github-actions bot removed the development-process Related to development process of DataFusion label Jan 14, 2024
@alamb
Copy link
Contributor Author

alamb commented Jan 14, 2024

Thank you for the review @andygrove

@Dandandan Dandandan merged commit 08de64d into apache:main Jan 16, 2024
22 checks passed
@Dandandan
Copy link
Contributor

Thank you @alamb

@alamb alamb deleted the alamb/clickbench_extended branch January 16, 2024 11:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add 'clickbench_extended' benchmark
4 participants