Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up Spark + Flink unit test execution #10581

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

snazy
Copy link
Member

@snazy snazy commented Jun 28, 2024

test task execution for Spark and Flink is rather slow. Gradle allows forking multiple JVMs to parallelize test execution, test classes are distributed amount the available test worker JVMs.

This change allows more than one parallel fork (test worker JVM). The maximum number of workers is calculated like this:max(min(Integer.getInteger("iceberg.maxSparkTestParallelism", 2), Runtime.runtime.availableProcessors() / 2), 1).

The default max settings for Spark and Flink are configured in gradle.properties to 2, but this can be overridden in ~/.gradle/gradle.properties.

This change does not affect CI, because in CI the number of "CPUs" on GitHub free hosted runners is 2, divided by 2 = 1.

@snazy
Copy link
Member Author

snazy commented Jun 28, 2024

CI failure is unrelated, reported via #10599

`test` task execution for Spark and Flink is rather slow. Gradle allows forking multiple JVMs to parallelize test execution, test _classes_ are distributed amount the available test worker JVMs.

This change allows more than one parallel fork (test worker JVM). The maximum number of workers is calculated like this:`max(min(Integer.getInteger("iceberg.maxSparkTestParallelism", 2), Runtime.runtime.availableProcessors() / 2), 1)`.

The default max settings for Spark and Flink are configured in `gradle.properties` to `2`, but this can be overridden in `~/.gradle/gradle.properties`.

This change does not affect CI, because in CI the number of "CPUs" on GitHub free hosted runners is 2, divided by 2 = 1.
@snazy
Copy link
Member Author

snazy commented Jun 29, 2024

This change might be controversial, because at least for Spark there's risk that test workers share the same resources on the file system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant