Add IMDB(JOB) Benchmark [2/N] (imdb queries) #12529

austin362667 · 2024-09-18T21:58:24Z

Which issue does this PR close?

Partially closes #12311.

After generating IMDB dataset (.csv, .parquet).
```
./benchmarks/bench.sh data imdb
```

Users can benchmark IMDB queries through following script against imdb parquet files.

./benchmarks/bench.sh run imdb

Or just benchmarking single query, for example, query_id 5 indicates query 2a,

cargo run --bin imdb benchmark datafusion --iterations 1 --path ./arrow-datafusion/benchmarks/data/imdb --prefer_hash_join true --format csv -o ./arrow-datafusion/benchmarks/results/heads_doupache_imdb-data/imdb.json --query 5 --debug

Returning

 === Logical plan ===
 Projection: min(t.title) AS movie_title
   Aggregate: groupBy=[[]], aggr=[[min(t.title)]]
     Filter: cn.country_code = Utf8("[de]") AND k.keyword = Utf8("character-name-in-title") AND cn.id = mc.company_id AND mc.movie_id = t.id AND t.id = mk.movie_id AND mk.keyword_id = k.id AND mc.movie_id = mk.movie_id
 
 (...)

 +-------------+
 | movie_title |
 +-------------+
 | 'Doc'       |
 +-------------+
 Query 5 iteration 0 took 3222.3 ms and returned 1 rows
 Query 5 avg time: 3222.30 ms

And verify SQL results via SQL Logic Test against imdb csv files
```
INCLUDE_IMDB=true cargo test --test sqllogictests -- imdb
```

Rationale for this change

We download IMDB queries from https://db.in.tum.de/~leis/qo/job.tgz and benchmark them with the helps of Add JOB benchmark dataset [1/N] (imdb dataset) #12497.
Ensuring correctness by imdb.slt, just like what we did to tpch.slt.

Unlike TPC-H, IMDB dataset is not generated and it's fixed sized, so no scaling factor and we don't need another docker container to generate data and answers.
I have also cross-checked answers in csv files from https://github.com/duckdb/duckdb/tree/main/benchmark/imdb/answers .

What changes are included in this PR?

IMDB(JOB) queries don't have incremental query_id, so I hard-coded the benchmark runner query_id (1,2,3,4, ... 113 in integer) to actual IMDB query name (1a, 1b, 1c, 1d, 2a, ... 33c in string, there is no pattern) mapping via lots of if.

Currently, I've only add SLT for:

Are these changes tested?

Yes, please check test_files/imdb for details.

Are there any user-facing changes?

No.

benchmarks/src/imdb/convert.rs

benchmarks/src/imdb/mod.rs

austin362667

I'll rebase on @doupache 's PR #12497 while it's merged

doupache · 2024-09-24T06:02:30Z

I have test all table in imdb dataset , there is no "negative" id

SELECT * FROM aka_name 
WHERE id < 0 LIMIT 5

SELECT * FROM aka_title 
WHERE  id < 0 LIMIT 5

SELECT * FROM cast_info  
WHERE   id < 0 LIMIT 5

SELECT * FROM char_name  
WHERE   id < 0 LIMIT 5

SELECT * FROM comp_cast_type  
WHERE   id < 0 LIMIT 5

SELECT * FROM company_name  
WHERE   id < 0 LIMIT 5

SELECT * FROM company_type  
WHERE   id < 0 LIMIT 5

SELECT * FROM complete_cast  
WHERE   id < 0 LIMIT 5

SELECT * FROM info_type  
WHERE   id < 0 LIMIT 5

SELECT * FROM keyword  
WHERE   id < 0 LIMIT 5

SELECT * FROM kind_type  
WHERE   id < 0 LIMIT 5

SELECT * FROM link_type  
WHERE   id < 0 LIMIT 5

SELECT * FROM movie_companies  
WHERE   id < 0 LIMIT 5

SELECT * FROM movie_info  
WHERE   id < 0 LIMIT 5

SELECT * FROM movie_info_idx  
WHERE   id < 0 LIMIT 5

SELECT * FROM movie_keyword  
WHERE   id < 0 LIMIT 5

SELECT * FROM movie_link  
WHERE   id < 0 LIMIT 5

SELECT * FROM name  
WHERE id < 0 LIMIT 5

SELECT * FROM person_info  
WHERE   id < 0 LIMIT 5

SELECT * FROM role_type  
WHERE   id < 0 LIMIT 5

SELECT * FROM title  
WHERE   id < 0 LIMIT 5

Signed-off-by: Austin Liu <austin362667@gmail.com>

Signed-off-by: Austin Liu <austin362667@gmail.com> Fix `get_query_sql()` for CI roundtrip test Signed-off-by: Austin Liu <austin362667@gmail.com> Fix `get_query_sql()` for CI roundtrip test Signed-off-by: Austin Liu <austin362667@gmail.com>

Signed-off-by: Austin Liu <austin362667@gmail.com>

Signed-off-by: Austin Liu <austin362667@gmail.com> Prepare IMDB dataset Signed-off-by: Austin Liu <austin362667@gmail.com>

use uint as id type and reuse session ctx

Signed-off-by: Austin Liu <austin362667@gmail.com> Fix path Signed-off-by: Austin Liu <austin362667@gmail.com> Fix path Signed-off-by: Austin Liu <austin362667@gmail.com> Remove `tpch` in `imdb` benchmark Signed-off-by: Austin Liu <austin362667@gmail.com>

jayzhan211

👍

alamb

Thank you @austin362667

My only concern is that the newly added "verify results" test takes 20 minutes

I don't think it is practical to run such queries on each CI run 🤔 it seems quite wasteful and we can get the same coverage using more targeted testing

You can see we only use a tiny amount of data for the tpch benchmark tests. Maybe we can split the addition to bench.sh from the slt files so we can discuss how to test these queries in a different PR?

austin362667 · 2024-10-03T16:15:36Z

Sure, that makes sense to me. I'll remove that part first. Thank you @alamb
This way could also avoid unethically sending too much requests to Peter Boncz's personal page. 😅

Signed-off-by: Austin Liu <austin362667@gmail.com> Remove IMDB(JOB) slt in CI Signed-off-by: Austin Liu <austin362667@gmail.com>

alamb

Thanks @austin362667 -- let's get this one in and iterate on it

alamb · 2024-10-03T19:47:19Z

benchmarks/src/imdb/run.rs

+#[structopt(verbatim_doc_comment)]
+pub struct RunOpt {
+    /// Query number. If not specified, runs all queries
+    #[structopt(short, long)]


Given JOB seems to identify queries by alphanumeric (like 1a instead of 1) it might make sense simply to have this be an Option<String> and avoid having the mapping required below

@etseidl

* Add support for external tables with qualified names (#12645) * Make support schemas * Set default name to table * Remove print statements and stale comment * Add tests for create table * Fix typo * Update datafusion/sql/src/statement.rs Co-authored-by: Jonah Gao <jonahgao@msn.com> * convert create_external_table to objectname * Add sqllogic tests * Fix failing tests --------- Co-authored-by: Jonah Gao <jonahgao@msn.com> * Fix Regex signature types (#12690) * Fix Regex signature types * Uncomment the shared tests in string_query.slt.part and removed tests copies everywhere else * Test `LIKE` and `MATCH` with flags; Remove new tests from regexp.slt * Refactor `ByteGroupValueBuilder` to use `MaybeNullBufferBuilder` (#12681) * Fix malformed hex string literal in docs (#12708) * Simplify match patterns in coercion rules (#12711) Remove conditions where unnecessary. Refactor to improve readability. * Remove aggregate functions dependency on frontend (#12715) * Remove aggregate functions dependency on frontend DataFusion is a SQL query engine and also a reusable library for building query engines. The core functionality should not depend on frontend related functionalities like `sqlparser` or `datafusion-sql`. * Remove duplicate license header * Minor: Remove clone in `transform_to_states` (#12707) * rm clone Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fmt Signed-off-by: jayzhan211 <jayzhan211@gmail.com> --------- Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * Refactor tests for union sorting properties, add tests for unions and constants (#12702) * Refactor tests for union sorting properties * update doc test * Undo import reordering * remove unecessary static lifetimes * Fix: support Qualified Wildcard in count aggregate function (#12673) * Reduce code duplication in `PrimitiveGroupValueBuilder` with const generics (#12703) * Reduce code duplication in `PrimitiveGroupValueBuilder` with const generics * Fix docs * Disallow duplicated qualified field names (#12608) * Disallow duplicated qualified field names * Fix tests * Optimize base64/hex decoding by pre-allocating output buffers (~2x faster) (#12675) * add bench * replace macro with generic function * remove duplicated code * optimize base64/hex decode * Allow DynamicFileCatalog support to query partitioned file (#12683) * support to query partitioned table for dynamic file catalog * cargo clippy * split partitions inferring to another function * Support `LIMIT` Push-down logical plan optimization for `Extension` nodes (#12685) * Update trait `UserDefinedLogicalNodeCore` Signed-off-by: Austin Liu <austin362667@gmail.com> * Update corresponding interface Signed-off-by: Austin Liu <austin362667@gmail.com> Add rewrite rule for `push-down-limit` for `Extension` Signed-off-by: Austin Liu <austin362667@gmail.com> * Add rewrite rule for `push-down-limit` for `Extension` and tests Signed-off-by: Austin Liu <austin362667@gmail.com> * Update corresponding interface Signed-off-by: Austin Liu <austin362667@gmail.com> * Reorganize to match guard Signed-off-by: Austin Liu <austin362667@gmail.com> * Clena up Signed-off-by: Austin Liu <austin362667@gmail.com> Clean up Signed-off-by: Austin Liu <austin362667@gmail.com> --------- Signed-off-by: Austin Liu <austin362667@gmail.com> * Fix AvroReader: Add union resolving for nested struct arrays (#12686) * Add union resolving for nested struct arrays * Add test * Change test * Reproduce index error * fmt --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Adds macros for creating `WindowUDF` and `WindowFunction` expression (#12693) * Adds macro for udwf singleton * Adds a doc comment parameter to macro * Add doc comment for `create_udwf` macro * Uses default constructor * Update `Cargo.lock` in `datafusion-cli` * Fixes: expand `$FN_NAME` in doc strings * Adds example for macro usage * Renames macro * Improve doc comments * Rename udwf macro * Minor: doc copy edits * Adds macro for creating fluent-style expression API * Adds support for 1 or more parameters in expression function * Rewrite doc comments * Rename parameters * Minor: formatting * Adds doc comment for `create_udwf_expr` macro * Improve example docs * Hides extraneous code in doc comments * Add a one-line readme * Adds doc test assertions + minor formatting fixes * Adds common macro for defining user-defined window functions * Adds doc comment for `define_udwf_and_expr` * Defines `RowNumber` using common macro * Add usage example for common macro * Adds usage for custom constructor * Add examples for remaining patterns * Improve doc comments for usage examples * Rewrite inner line docs * Rewrite `create_udwf_expr!` doc comments * Minor doc improvements * Fix doc test and usage example * Add inline comments for macro patterns * Minor: change doc comment in example * Support unparsing plans with both Aggregation and Window functions (#12705) * Support unparsing plans with both Aggregation and Window functions (#35) * Fix unparsing for aggregation grouping sets * Add test for grouping set unparsing * Update datafusion/sql/src/unparser/utils.rs Co-authored-by: Jax Liu <liugs963@gmail.com> * Update datafusion/sql/src/unparser/utils.rs Co-authored-by: Jax Liu <liugs963@gmail.com> * Update * More tests --------- Co-authored-by: Jax Liu <liugs963@gmail.com> * Fix strpos invocation with dictionary and null (#12712) In 1b3608d `strpos` signature was modified to indicate it supports dictionary as input argument, but the invoke method doesn't support them. * docs: Update DataFusion introduction to clarify that DataFusion does provide an "out of the box" query engine (#12666) * Update DataFusion introduction to show that DataFusion offers packaged versions for end users * change order * Update README.md Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * refine wording and update user guide for consistency * prettier --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Framework for generating function docs from embedded code documentation (#12668) * Initial work on #12432 to allow for generation of udf docs from embedded documentation in the code * Add missing license header. * Fixed examples. * Fixing a really weird RustRover/wsl ... something. No clue what happened there. * permission change * Cargo fmt update. * Refactored Documentation to allow it to be used in a const. * Add documentation for syntax_example * Refactoring Documentation based on PR feedback. * Cargo fmt update. * Doc update * Fixed copy/paste error. * Minor text updates. --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Add IMDB(JOB) Benchmark [2/N] (imdb queries) (#12529) * imdb dataset * cargo fmt * Add 113 queries for IMDB(JOB) Signed-off-by: Austin Liu <austin362667@gmail.com> * Add `get_query_sql` from `query_id` string Signed-off-by: Austin Liu <austin362667@gmail.com> * Fix CSV reader & Remove Parquet partition Signed-off-by: Austin Liu <austin362667@gmail.com> * Add benchmark IMDB runner Signed-off-by: Austin Liu <austin362667@gmail.com> * Add `run_imdb` script Signed-off-by: Austin Liu <austin362667@gmail.com> * Add checker for imdb option Signed-off-by: Austin Liu <austin362667@gmail.com> * Add SLT for IMDB Signed-off-by: Austin Liu <austin362667@gmail.com> * Fix `get_query_sql()` for CI roundtrip test Signed-off-by: Austin Liu <austin362667@gmail.com> Fix `get_query_sql()` for CI roundtrip test Signed-off-by: Austin Liu <austin362667@gmail.com> Fix `get_query_sql()` for CI roundtrip test Signed-off-by: Austin Liu <austin362667@gmail.com> * Clean up Signed-off-by: Austin Liu <austin362667@gmail.com> * Add missing license Signed-off-by: Austin Liu <austin362667@gmail.com> * Add IMDB(JOB) queries `2b` to `5c` Signed-off-by: Austin Liu <austin362667@gmail.com> * Add `INCLUDE_IMDB` in CI verify-benchmark-results Signed-off-by: Austin Liu <austin362667@gmail.com> * Prepare IMDB dataset Signed-off-by: Austin Liu <austin362667@gmail.com> Prepare IMDB dataset Signed-off-by: Austin Liu <austin362667@gmail.com> * use uint as id type * format * Seperate `tpch` and `imdb` benchmarking CI jobs Signed-off-by: Austin Liu <austin362667@gmail.com> Fix path Signed-off-by: Austin Liu <austin362667@gmail.com> Fix path Signed-off-by: Austin Liu <austin362667@gmail.com> Remove `tpch` in `imdb` benchmark Signed-off-by: Austin Liu <austin362667@gmail.com> * Remove IMDB(JOB) slt in CI Signed-off-by: Austin Liu <austin362667@gmail.com> Remove IMDB(JOB) slt in CI Signed-off-by: Austin Liu <austin362667@gmail.com> --------- Signed-off-by: Austin Liu <austin362667@gmail.com> Co-authored-by: DouPache <douenergy@gmail.com> * Minor: avoid clone while calculating union equivalence properties (#12722) * Minor: avoid clone while calculating union equivalence properties * Update datafusion/physical-expr/src/equivalence/properties.rs * fmt * Simplify streaming_merge function parameters (#12719) * simplify streaming_merge function parameters * revert test change * change StreamingMergeConfig into builder pattern * Fix links on docs index page (#12750) * Provide field and schema metadata missing on cross joins, and union with null fields. (#12729) * test: reproducer for missing schema metadata on cross join * fix: pass thru schema metadata on cross join * fix: preserve metadata when transforming to view types * test: reproducer for missing field metadata in left hand NULL field of union * fix: preserve field metadata from right side of union * chore: safe indexing * Minor: Update string tests for strpos (#12739) * Apply `type_union_resolution` to array and values (#12753) * cleanup make array coercion rule Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * change to type union resolution Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * change value too Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix tpyo Signed-off-by: jayzhan211 <jayzhan211@gmail.com> --------- Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * Add `DocumentationBuilder::with_standard_argument` to reduce copy/paste (#12747) * Add `DocumentationBuilder::with_standard_expression` to reduce copy/paste * fix doc * fix standard argument * Update docs * Improve documentation to explain what is different * fix `equal_to` in `PrimitiveGroupValueBuilder` (#12758) * fix `equal_to` in `PrimitiveGroupValueBuilder`. * fix typo. * add uts. * reduce calling of `is_null`. * Minor: doc how field name is to be set (#12757) * Fix `equal_to` in `ByteGroupValueBuilder` (#12770) * Fix `equal_to` in `ByteGroupValueBuilder` * refactor null_equal_to * Update datafusion/physical-plan/src/aggregates/group_values/group_column.rs * Allow simplification even when nullable (#12746) The nullable requirement seem to have been added in #1401 but as far as I can tell they are not needed for these 2 cases. I think this can be shown using this truth table: (generated using datafusion-cli without this patch) ``` > CREATE TABLE t (v BOOLEAN) as values (true), (false), (NULL); > select t.v, t2.v, t.v AND (t.v OR t2.v), t.v OR (t.v AND t2.v) from t cross join t as t2; +-------+-------+---------------------+---------------------+ | v | v | t.v AND t.v OR t2.v | t.v OR t.v AND t2.v | +-------+-------+---------------------+---------------------+ | true | true | true | true | | true | false | true | true | | true | | true | true | | false | true | false | false | | false | false | false | false | | false | | false | false | | | true | | | | | false | | | | | | | | +-------+-------+---------------------+---------------------+ ``` And it seems Spark applies both of these and DuckDB applies only the first one. * Fix unnest conjunction with selecting wildcard expression (#12760) * fix unnest statement with wildcard expression * add commnets * Improve `round` scalar function unparsing for Postgres (#12744) * Postgres: enforce required `NUMERIC` type for `round` scalar function (#34) Includes initial support for dialects to override scalar functions unparsing * Document scalar_function_to_sql_overrides fn * Fix stack overflow calculating projected orderings (#12759) * Fix stack overflow calculating projected orderings * fix docs * Port / Add Documentation for `VarianceSample` and `VariancePopulation` (#12742) * Upgrade arrow/parquet to `53.1.0` / fix clippy (#12724) * Update to arrow/parquet 53.1.0 * Update some API * update for changed file sizes * Use non deprecated APIs * Use ParquetMetadataReader from @etseidl * remove upstreamed implementation * Update CSV schema * Use upstream is_null and is_not_null kernels * feat: add support for Substrait ExtendedExpression (#12728) * Add support for serializing and deserializing Substrait ExtendedExpr message * Address clippy reviews * Reuse existing rename method * Transformed::new_transformed: Fix documentation formatting (#12787) Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * fix: Correct results for grouping sets when columns contain nulls (#12571) * Fix grouping sets behavior when data contains nulls * PR suggestion comment * Update new test case * Add grouping_id to the logical plan * Add doc comment next to INTERNAL_GROUPING_ID * Fix unparsing of Aggregate with grouping sets --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Migrate documentation for all string functions from scalar_functions.md to code (#12775) * Added documentation for string and unicode functions. * Fixed issues with aliases. * Cargo fmt. * Minor doc fixes. * Update docs for var_pop/samp --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Account for constant equivalence properties in union, tests (#12562) * Minor: clarify comment about empty dependencies (#12786) * Introduce Signature::String and return error if input of `strpos` is integer (#12751) * fix sig Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix error Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix all signature Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix all signature Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * change default type Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * clippy Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix docs Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * rm deadcode Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * cleanup Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * cleanup Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * rm test Signed-off-by: jayzhan211 <jayzhan211@gmail.com> --------- Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * Minor: improve docs on MovingMin/MovingMax (#12790) * Add slt tests (#12721) --------- Signed-off-by: jayzhan211 <jayzhan211@gmail.com> Signed-off-by: Austin Liu <austin362667@gmail.com> Co-authored-by: OussamaSaoudi <45303303+OussamaSaoudi@users.noreply.github.com> Co-authored-by: Jonah Gao <jonahgao@msn.com> Co-authored-by: Dmitrii Blaginin <dmitrii@blaginin.me> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Tomoaki Kawada <kawada@kmckk.co.jp> Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com> Co-authored-by: Jay Zhan <jayzhan211@gmail.com> Co-authored-by: HuSen <husen.xjtu@gmail.com> Co-authored-by: Emil Ejbyfeldt <emil.ejbyfeldt@gmail.com> Co-authored-by: Simon Vandel Sillesen <simon.vandel@gmail.com> Co-authored-by: Jax Liu <liugs963@gmail.com> Co-authored-by: Austin Liu <austin362667@gmail.com> Co-authored-by: JonasDev1 <jswipp@googlemail.com> Co-authored-by: jcsherin <jacob@protoship.io> Co-authored-by: Sergei Grebnov <sergei.grebnov@gmail.com> Co-authored-by: Andy Grove <agrove@apache.org> Co-authored-by: Bruce Ritchie <bruce.ritchie@veeva.com> Co-authored-by: DouPache <douenergy@gmail.com> Co-authored-by: mertak-synnada <mertak67+synaada@gmail.com> Co-authored-by: Bryce Mecum <petridish@gmail.com> Co-authored-by: wiedld <wiedld@users.noreply.github.com> Co-authored-by: kamille <caoruiqiu.crq@antgroup.com> Co-authored-by: Weston Pace <weston.pace@gmail.com> Co-authored-by: Val Lorentz <vlorentz@softwareheritage.org>

github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Sep 18, 2024

austin362667 force-pushed the imdb-benchmark branch 2 times, most recently from c1ccd0b to c3b4b8c Compare September 19, 2024 00:11

github-actions bot added the development-process Related to development process of DataFusion label Sep 19, 2024

austin362667 force-pushed the imdb-benchmark branch 2 times, most recently from ef99ebf to 0d89553 Compare September 20, 2024 07:13

austin362667 marked this pull request as ready for review September 20, 2024 08:08

austin362667 mentioned this pull request Sep 21, 2024

Fix and Improve Sort Pushdown for Nested Loop and Hash Join #12559

Merged

andygrove reviewed Sep 21, 2024

View reviewed changes

benchmarks/src/imdb/convert.rs Outdated Show resolved Hide resolved

andygrove reviewed Sep 21, 2024

View reviewed changes

benchmarks/src/imdb/mod.rs Outdated Show resolved Hide resolved

austin362667 commented Sep 23, 2024

View reviewed changes

doupache mentioned this pull request Sep 24, 2024

use uint as id type and reuse session ctx austin362667/arrow-datafusion#1

Merged

doupache and others added 15 commits September 24, 2024 19:39

imdb dataset

4a164b6

cargo fmt

199272c

Add 113 queries for IMDB(JOB)

bc2a7d5

Signed-off-by: Austin Liu <austin362667@gmail.com>

Add get_query_sql from query_id string

884850d

Signed-off-by: Austin Liu <austin362667@gmail.com>

Fix CSV reader & Remove Parquet partition

6cff605

Signed-off-by: Austin Liu <austin362667@gmail.com>

Add benchmark IMDB runner

d916d8b

Signed-off-by: Austin Liu <austin362667@gmail.com>

Add run_imdb script

d7df6f8

Signed-off-by: Austin Liu <austin362667@gmail.com>

Add checker for imdb option

08097bf

Signed-off-by: Austin Liu <austin362667@gmail.com>

Add SLT for IMDB

358eeb9

Signed-off-by: Austin Liu <austin362667@gmail.com>

Fix get_query_sql() for CI roundtrip test

1a657d7

Signed-off-by: Austin Liu <austin362667@gmail.com> Fix `get_query_sql()` for CI roundtrip test Signed-off-by: Austin Liu <austin362667@gmail.com> Fix `get_query_sql()` for CI roundtrip test Signed-off-by: Austin Liu <austin362667@gmail.com>

Clean up

a9fb5c2

Signed-off-by: Austin Liu <austin362667@gmail.com>

Add missing license

b4ce6fd

Signed-off-by: Austin Liu <austin362667@gmail.com>

Add IMDB(JOB) queries 2b to 5c

01147a6

Signed-off-by: Austin Liu <austin362667@gmail.com>

Add INCLUDE_IMDB in CI verify-benchmark-results

1a09524

Signed-off-by: Austin Liu <austin362667@gmail.com>

Prepare IMDB dataset

5c41915

Signed-off-by: Austin Liu <austin362667@gmail.com> Prepare IMDB dataset Signed-off-by: Austin Liu <austin362667@gmail.com>

austin362667 force-pushed the imdb-benchmark branch from 0d89553 to 5c41915 Compare September 24, 2024 11:40

use uint as id type

e3c325b

doupache and others added 2 commits September 28, 2024 22:32

format

76c3810

Merge pull request #1 from doupache/PR-12529

d35c7b6

use uint as id type and reuse session ctx

austin362667 force-pushed the imdb-benchmark branch from 06522e7 to ffe001d Compare September 29, 2024 12:47

austin362667 force-pushed the imdb-benchmark branch from ffe001d to 6ee3e89 Compare September 29, 2024 15:59

jayzhan211 approved these changes Oct 2, 2024

View reviewed changes

alamb reviewed Oct 3, 2024

View reviewed changes

github-actions bot removed the sqllogictest SQL Logic Tests (.slt) label Oct 3, 2024

Remove IMDB(JOB) slt in CI

dbe2781

Signed-off-by: Austin Liu <austin362667@gmail.com> Remove IMDB(JOB) slt in CI Signed-off-by: Austin Liu <austin362667@gmail.com>

austin362667 force-pushed the imdb-benchmark branch from 6e7390f to dbe2781 Compare October 3, 2024 16:18

github-actions bot removed the development-process Related to development process of DataFusion label Oct 3, 2024

alamb approved these changes Oct 3, 2024

View reviewed changes

alamb merged commit 77f330c into apache:main Oct 3, 2024
24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add IMDB(JOB) Benchmark [2/N] (imdb queries) #12529

Add IMDB(JOB) Benchmark [2/N] (imdb queries) #12529

austin362667 commented Sep 18, 2024 •

edited

Loading

austin362667 left a comment

doupache commented Sep 24, 2024

jayzhan211 left a comment

alamb left a comment

austin362667 commented Oct 3, 2024 •

edited

Loading

alamb left a comment

alamb Oct 3, 2024

Add IMDB(JOB) Benchmark [2/N] (imdb queries) #12529

Add IMDB(JOB) Benchmark [2/N] (imdb queries) #12529

Conversation

austin362667 commented Sep 18, 2024 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

austin362667 left a comment

Choose a reason for hiding this comment

doupache commented Sep 24, 2024

jayzhan211 left a comment

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

austin362667 commented Oct 3, 2024 • edited Loading

alamb left a comment

Choose a reason for hiding this comment

alamb Oct 3, 2024

Choose a reason for hiding this comment

austin362667 commented Sep 18, 2024 •

edited

Loading

austin362667 commented Oct 3, 2024 •

edited

Loading