Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Add benchmark for adding files #11029

Merged
merged 2 commits into from
Aug 28, 2024

Conversation

aokolnychyi
Copy link
Contributor

@aokolnychyi aokolnychyi commented Aug 28, 2024

This PR adds a benchmark for appending data. As shown below, Iceberg is currently very slow when an operation contains many new data files. I'll follow up with a fix separately.

Benchmark                    (fast)  (numFiles)  Mode  Cnt   Score   Error  Units
AppendBenchmark.appendFiles    true      500000    ss    5   7.451 ± 0.184   s/op
AppendBenchmark.appendFiles    true     1000000    ss    5  14.646 ± 0.371   s/op
AppendBenchmark.appendFiles    true     2500000    ss    5  36.853 ± 0.798   s/op
AppendBenchmark.appendFiles   false      500000    ss    5   7.556 ± 0.627   s/op
AppendBenchmark.appendFiles   false     1000000    ss    5  14.869 ± 0.286   s/op
AppendBenchmark.appendFiles   false     2500000    ss    5  37.495 ± 1.247   s/op

@github-actions github-actions bot added the core label Aug 28, 2024
@aokolnychyi aokolnychyi changed the title Core: Add benchmark for FastAppend Core: Add benchmark for adding files Aug 28, 2024
Copy link
Contributor

@dramaticlly dramaticlly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like for unpartitioned table there's not much benchmark difference for fast and merge append. Looking forward to your optimization fix.

@aokolnychyi
Copy link
Contributor Author

aokolnychyi commented Aug 28, 2024

Correct, we write new metadata differently in fast and merge APIs but the root cause is the same.

@aokolnychyi aokolnychyi merged commit 6c79640 into apache:main Aug 28, 2024
46 checks passed
@aokolnychyi
Copy link
Contributor Author

Thanks for reviewing, @dramaticlly @danielcweeks!

jenbaldwin pushed a commit to Teradata/iceberg that referenced this pull request Sep 17, 2024
* main: (208 commits)
  Docs: Fix Flink 1.20 support versions (apache#11065)
  Flink: Fix compile warning (apache#11072)
  Docs: Initial committer guidelines and requirements for merging (apache#10780)
  Core: Refactor ZOrderByteUtils (apache#10624)
  API: implement types timestamp_ns and timestamptz_ns (apache#9008)
  Build: Bump com.google.errorprone:error_prone_annotations (apache#11055)
  Build: Bump mkdocs-material from 9.5.33 to 9.5.34 (apache#11062)
  Flink: Backport PR apache#10526 to v1.18 and v1.20 (apache#11018)
  Kafka Connect: Disable publish tasks in runtime project (apache#11032)
  Flink: add unit tests for range distribution on bucket partition column (apache#11033)
  Spark 3.5: Use FileGenerationUtil in PlanningBenchmark (apache#11027)
  Core: Add benchmark for appending files (apache#11029)
  Build: Ignore benchmark output folders across all modules (apache#11030)
  Spec: Add RemovePartitionSpecsUpdate REST update type (apache#10846)
  Docs: bump latest version to 1.6.1 (apache#11036)
  OpenAPI, Build: Apply spotless to testFixtures source code (apache#11024)
  Core: Generate realistic bounds in benchmarks (apache#11022)
  Add REST Compatibility Kit (apache#10908)
  Flink: backport PR apache#10832 of inferring parallelism in FLIP-27 source (apache#11009)
  Docs: Add Druid docs url to sidebar (apache#10997)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants