Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing delta-storage dependency and class loader workaround to Delta table ingestion #16648

Merged
merged 3 commits into from
Jun 25, 2024

Conversation

abhishekrb19
Copy link
Contributor

@abhishekrb19 abhishekrb19 commented Jun 24, 2024

With the upgrade to Kernel 3.2.0, the Druid Delta connector extension isn't able to read and ingest Delta tables successfully.

The Delta Kernel now requires the delta-storage dependency, so add this in our connector. However, even after adding the dependency, reading from a Delta table fails with the following error:

java.lang.IllegalArgumentException: Can not instantiate `LogStore` class: io.delta.storage.HDFSLogStore
	at io.delta.kernel.defaults.internal.DefaultEngineErrors.canNotInstantiateLogStore(DefaultEngineErrors.java:26)
	at io.delta.kernel.defaults.internal.logstore.LogStoreProvider.getLogStore(LogStoreProvider.java:88)
	at io.delta.kernel.defaults.engine.DefaultFileSystemClient.listFrom(DefaultFileSystemClient.java:76)
	at io.delta.kernel.internal.snapshot.SnapshotManager.listFrom(SnapshotManager.java:228)
	at io.delta.kernel.internal.snapshot.SnapshotManager.listFromOrNone(SnapshotManager.java:253)
	at io.delta.kernel.internal.snapshot.SnapshotManager.listDeltaAndCheckpointFiles(SnapshotManager.java:294)
	at io.delta.kernel.internal.snapshot.SnapshotManager.getLogSegmentForVersion(SnapshotManager.java:466)
	at io.delta.kernel.internal.snapshot.SnapshotManager.getLogSegmentFrom(SnapshotManager.java:415)
	at io.delta.kernel.internal.snapshot.SnapshotManager.getSnapshotAtInit(SnapshotManager.java:352)
	at io.delta.kernel.internal.snapshot.SnapshotManager.buildLatestSnapshot(SnapshotManager.java:113)
	at io.delta.kernel.internal.TableImpl.getLatestSnapshot(TableImpl.java:55)
	at org.apache.druid.delta.input.DeltaInputSource.createSplits(DeltaInputSource.java:210)
	at org.apache.druid.msq.input.external.ExternalInputSpecSlicer.sliceSplittableInputSource(ExternalInputSpecSlicer.java:119)

Please see the upstream issue delta-io/delta#3299 and fix delta-io/delta#3304. I verified the upstream fix by deploying a custom Kernel jar and reverting this Class loader workaround, so we should be able to remove it once we upgrade the Kernel to a later version.

But as the comments in this patch notes, this is only a temporary workaround until we update to the next upstream Kernel library version. The alternative to this patch would be to revert to 3.1.0 where this issue didn't exist. But we lose some new features and performance improvements made in 3.2.0.

Note that this issue wasn't caught in the unit tests present in our Druid extension because the test setup doesn't fully model an actual setup. For example, the DeltaLakeDruidModule configuration is skipped.
I will follow up with a sanity IT to catch such issues in the future.

This PR has:

  • been self-reviewed.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • been tested in a test Druid cluster.

With the upgrade to Kernel 3.2.0, the Druid Delta connector extension
isn't able to ingest from Delta tables successfully.

Please see delta-io/delta#3299

The underlying problem seems to be coming from
https://github.com/delta-io/delta/blob/master/kernel/kernel-defaults/src/main/java/io/delta/kernel/defaults/internal/logstore/LogStoreProvider.java#L99

This patch is a workaround to setting the thread class loader explictly.
The Kernel community may consider a fix in the next release as it's affected another
connector as well.
@abhishekrb19 abhishekrb19 merged commit e01f155 into apache:master Jun 25, 2024
55 checks passed
@abhishekrb19 abhishekrb19 deleted the delta_lake_workaround branch June 25, 2024 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants