-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Dataproc 2.0 test_reading_file_rewritten_with_fastparquet tests failing #9545
Comments
tgravescs
added
bug
Something isn't working
? - Needs Triage
Need team to review and classify
labels
Oct 25, 2023
Right. Not clear to me from the logs either. I'll try repro it manually. |
Ok, I think I understand the problem. This test should be writing to local file system, and then copying the file over to HDFS, for the test. |
mythrocks
added a commit
to mythrocks/spark-rapids
that referenced
this issue
Oct 30, 2023
Fixes NVIDIA#9545. This commit fixes the `fastparquet` tests to run on Spark clusters where the `fs.default.name` does not point to the local filesystem. Before this commit, the `fastparquet` tests assumed that the parquet files generated for the tests were written to local filesystem, and could be read from both `fastparquet` and Spark from the same location. However, this fails when run against clusters whose default filesystem is HDFS. `fastparquet` can only read from the local filesystem. This commit changes the tests as follows: 1. For tests where data is generated by Spark, the data is copied to local filesystem before it is read by `fastparquet`. 2. For tests where data is generated by `fastparquet`, the data is copied to the default Hadoop filesystem before reading through Spark. Signed-off-by: MithunR <mythrocks@gmail.com>
mythrocks
added a commit
to mythrocks/spark-rapids
that referenced
this issue
Oct 31, 2023
Fixes NVIDIA#9545. This commit fixes the `fastparquet` tests to run on Spark clusters where the `fs.default.name` does not point to the local filesystem. Before this commit, the `fastparquet` tests assumed that the parquet files generated for the tests were written to local filesystem, and could be read from both `fastparquet` and Spark from the same location. However, this fails when run against clusters whose default filesystem is HDFS. `fastparquet` can only read from the local filesystem. This commit changes the tests as follows: 1. For tests where data is generated by Spark, the data is copied to local filesystem before it is read by `fastparquet`. 2. For tests where data is generated by `fastparquet`, the data is copied to the default Hadoop filesystem before reading through Spark. Signed-off-by: MithunR <mythrocks@gmail.com>
mythrocks
added
test
Only impacts tests
and removed
? - Needs Triage
Need team to review and classify
labels
Oct 31, 2023
mythrocks
added a commit
that referenced
this issue
Oct 31, 2023
Fixes #9545. This commit fixes the `fastparquet` tests to run on Spark clusters where the `fs.default.name` does not point to the local filesystem. Before this commit, the `fastparquet` tests assumed that the parquet files generated for the tests were written to local filesystem, and could be read from both `fastparquet` and Spark from the same location. However, this fails when run against clusters whose default filesystem is HDFS. `fastparquet` can only read from the local filesystem. This commit changes the tests as follows: 1. For tests where data is generated by Spark, the data is copied to local filesystem before it is read by `fastparquet`. 2. For tests where data is generated by `fastparquet`, the data is copied to the default Hadoop filesystem before reading through Spark. Signed-off-by: MithunR <mythrocks@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
Our integration tests on Dataproc 2.0, the test_reading_file_rewritten_with_fastparquet tests are failing:
Not sure why it has no such file, maybe there was another crash or failure that caused these?
The text was updated successfully, but these errors were encountered: