Core: Add benchmark for adding files #11029

aokolnychyi · 2024-08-28T00:14:56Z

This PR adds a benchmark for appending data. As shown below, Iceberg is currently very slow when an operation contains many new data files. I'll follow up with a fix separately.

Benchmark                    (fast)  (numFiles)  Mode  Cnt   Score   Error  Units
AppendBenchmark.appendFiles    true      500000    ss    5   7.451 ± 0.184   s/op
AppendBenchmark.appendFiles    true     1000000    ss    5  14.646 ± 0.371   s/op
AppendBenchmark.appendFiles    true     2500000    ss    5  36.853 ± 0.798   s/op
AppendBenchmark.appendFiles   false      500000    ss    5   7.556 ± 0.627   s/op
AppendBenchmark.appendFiles   false     1000000    ss    5  14.869 ± 0.286   s/op
AppendBenchmark.appendFiles   false     2500000    ss    5  37.495 ± 1.247   s/op

dramaticlly

looks like for unpartitioned table there's not much benchmark difference for fast and merge append. Looking forward to your optimization fix.

aokolnychyi · 2024-08-28T21:54:46Z

Correct, we write new metadata differently in fast and merge APIs but the root cause is the same.

aokolnychyi · 2024-08-28T23:51:50Z

Thanks for reviewing, @dramaticlly @danielcweeks!

* main: (208 commits) Docs: Fix Flink 1.20 support versions (apache#11065) Flink: Fix compile warning (apache#11072) Docs: Initial committer guidelines and requirements for merging (apache#10780) Core: Refactor ZOrderByteUtils (apache#10624) API: implement types timestamp_ns and timestamptz_ns (apache#9008) Build: Bump com.google.errorprone:error_prone_annotations (apache#11055) Build: Bump mkdocs-material from 9.5.33 to 9.5.34 (apache#11062) Flink: Backport PR apache#10526 to v1.18 and v1.20 (apache#11018) Kafka Connect: Disable publish tasks in runtime project (apache#11032) Flink: add unit tests for range distribution on bucket partition column (apache#11033) Spark 3.5: Use FileGenerationUtil in PlanningBenchmark (apache#11027) Core: Add benchmark for appending files (apache#11029) Build: Ignore benchmark output folders across all modules (apache#11030) Spec: Add RemovePartitionSpecsUpdate REST update type (apache#10846) Docs: bump latest version to 1.6.1 (apache#11036) OpenAPI, Build: Apply spotless to testFixtures source code (apache#11024) Core: Generate realistic bounds in benchmarks (apache#11022) Add REST Compatibility Kit (apache#10908) Flink: backport PR apache#10832 of inferring parallelism in FLIP-27 source (apache#11009) Docs: Add Druid docs url to sidebar (apache#10997) ...

github-actions bot added the core label Aug 28, 2024

Core: Add benchmark for adding files

fab5980

aokolnychyi force-pushed the fast-append-benchmark branch from 6fae0f9 to fab5980 Compare August 28, 2024 00:47

aokolnychyi changed the title ~~Core: Add benchmark for FastAppend~~ Core: Add benchmark for adding files Aug 28, 2024

Spotless

3833173

dramaticlly approved these changes Aug 28, 2024

View reviewed changes

danielcweeks approved these changes Aug 28, 2024

View reviewed changes

aokolnychyi merged commit 6c79640 into apache:main Aug 28, 2024
46 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Core: Add benchmark for adding files #11029

Core: Add benchmark for adding files #11029

aokolnychyi commented Aug 28, 2024 •

edited

Loading

dramaticlly left a comment

aokolnychyi commented Aug 28, 2024 •

edited

Loading

aokolnychyi commented Aug 28, 2024

Core: Add benchmark for adding files #11029

Core: Add benchmark for adding files #11029

Conversation

aokolnychyi commented Aug 28, 2024 • edited Loading

dramaticlly left a comment

Choose a reason for hiding this comment

aokolnychyi commented Aug 28, 2024 • edited Loading

aokolnychyi commented Aug 28, 2024

aokolnychyi commented Aug 28, 2024 •

edited

Loading

aokolnychyi commented Aug 28, 2024 •

edited

Loading