Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] test_ts_formats_round_trip and test_datetime_roundtrip_with_legacy_rebase fail with DATAGEN_SEED=1699915317 #9701

Closed
abellina opened this issue Nov 14, 2023 · 2 comments · Fixed by #9736
Assignees
Labels
bug Something isn't working

Comments

@abellina
Copy link
Collaborator

abellina commented Nov 14, 2023

Repro:

SPARK_RAPIDS_TEST_DATAGEN_SEED=1699915317 ./run_pyspark_from_build.sh -k test_ts_formats_round_trip
SPARK_RAPIDS_TEST_DATAGEN_SEED=1699915317 ./run_pyspark_from_build.sh -k test_datetime_roundtrip_with_legacy_rebase

Some of the test_ts_formats_round_trip tests fail with this error:

self = TimestampType, ts = -62135585369000000

    def fromInternal(self, ts):
        if ts is not None:
            # using int to avoid precision loss in float
>           return datetime.datetime.fromtimestamp(ts // 1000000).replace(microsecond=ts % 1000000)
E           ValueError: year 0 is out of range

FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[-yyyy-MM-][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[-yyyy-MM-'T'HH:mm:ss.SSSXXX][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[-yyyy-MM-'T'HH:mm:ss[.SSS][XXX]][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[-yyyy-MM-'T'HH:mm:ss.SSS][DATAGEN_SEED=1699915317, INJECT_OOM] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[-yyyy-MM-'T'HH:mm:ss[.SSS]][DATAGEN_SEED=1699915317, INJECT_OOM] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[-yyyy-MM-'T'HH:mm:ss][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[-yyyy-MM-'T'HH:mm[:ss]][DATAGEN_SEED=1699915317, INJECT_OOM] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[-yyyy-MM-'T'HH:mm][DATAGEN_SEED=1699915317, INJECT_OOM] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[-yyyy/MM-][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[-yyyy/MM-'T'HH:mm:ss.SSSXXX][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[-yyyy/MM-'T'HH:mm:ss[.SSS][XXX]][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[-yyyy/MM-'T'HH:mm:ss.SSS][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[-yyyy/MM-'T'HH:mm:ss[.SSS]][DATAGEN_SEED=1699915317, INJECT_OOM] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[-yyyy/MM-'T'HH:mm:ss][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[-yyyy/MM-'T'HH:mm[:ss]][DATAGEN_SEED=1699915317, INJECT_OOM] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[-yyyy/MM-'T'HH:mm][DATAGEN_SEED=1699915317, INJECT_OOM] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[-MM-yyyy-][DATAGEN_SEED=1699915317, INJECT_OOM] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[-MM-yyyy-'T'HH:mm:ss.SSSXXX][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[-MM-yyyy-'T'HH:mm:ss[.SSS][XXX]][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[-MM-yyyy-'T'HH:mm:ss.SSS][DATAGEN_SEED=1699915317, INJECT_OOM] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[-MM-yyyy-'T'HH:mm:ss[.SSS]][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[-MM-yyyy-'T'HH:mm:ss][DATAGEN_SEED=1699915317, INJECT_OOM] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[-MM-yyyy-'T'HH:mm[:ss]][DATAGEN_SEED=1699915317, INJECT_OOM] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[-MM-yyyy-'T'HH:mm][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[-MM/yyyy-][DATAGEN_SEED=1699915317, INJECT_OOM] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[-MM/yyyy-'T'HH:mm:ss.SSSXXX][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[-MM/yyyy-'T'HH:mm:ss[.SSS][XXX]][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[-MM/yyyy-'T'HH:mm:ss.SSS][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[-MM/yyyy-'T'HH:mm:ss[.SSS]][DATAGEN_SEED=1699915317, INJECT_OOM] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[-MM/yyyy-'T'HH:mm:ss][DATAGEN_SEED=1699915317, INJECT_OOM] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[-MM/yyyy-'T'HH:mm[:ss]][DATAGEN_SEED=1699915317, INJECT_OOM] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[-MM/yyyy-'T'HH:mm][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[csv-yyyy-MM-][DATAGEN_SEED=1699915317, INJECT_OOM] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[csv-yyyy-MM-'T'HH:mm:ss.SSSXXX][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[csv-yyyy-MM-'T'HH:mm:ss[.SSS][XXX]][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[csv-yyyy-MM-'T'HH:mm:ss.SSS][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[csv-yyyy-MM-'T'HH:mm:ss[.SSS]][DATAGEN_SEED=1699915317, INJECT_OOM] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[csv-yyyy-MM-'T'HH:mm:ss][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[csv-yyyy-MM-'T'HH:mm[:ss]][DATAGEN_SEED=1699915317, INJECT_OOM] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[csv-yyyy-MM-'T'HH:mm][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[csv-yyyy/MM-][DATAGEN_SEED=1699915317, INJECT_OOM] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[csv-yyyy/MM-'T'HH:mm:ss.SSSXXX][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[csv-yyyy/MM-'T'HH:mm:ss[.SSS][XXX]][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[csv-yyyy/MM-'T'HH:mm:ss.SSS][DATAGEN_SEED=1699915317, INJECT_OOM] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[csv-yyyy/MM-'T'HH:mm:ss[.SSS]][DATAGEN_SEED=1699915317, INJECT_OOM] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[csv-yyyy/MM-'T'HH:mm:ss][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[csv-yyyy/MM-'T'HH:mm[:ss]][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[csv-yyyy/MM-'T'HH:mm][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[csv-MM-yyyy-][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[csv-MM-yyyy-'T'HH:mm:ss.SSSXXX][DATAGEN_SEED=1699915317, INJECT_OOM] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[csv-MM-yyyy-'T'HH:mm:ss[.SSS][XXX]][DATAGEN_SEED=1699915317, INJECT_OOM] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[csv-MM-yyyy-'T'HH:mm:ss.SSS][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[csv-MM-yyyy-'T'HH:mm:ss[.SSS]][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[csv-MM-yyyy-'T'HH:mm:ss][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[csv-MM-yyyy-'T'HH:mm[:ss]][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[csv-MM-yyyy-'T'HH:mm][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[csv-MM/yyyy-][DATAGEN_SEED=1699915317, INJECT_OOM] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[csv-MM/yyyy-'T'HH:mm:ss.SSSXXX][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[csv-MM/yyyy-'T'HH:mm:ss[.SSS][XXX]][DATAGEN_SEED=1699915317, INJECT_OOM] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[csv-MM/yyyy-'T'HH:mm:ss.SSS][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[csv-MM/yyyy-'T'HH:mm:ss[.SSS]][DATAGEN_SEED=1699915317, INJECT_OOM] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[csv-MM/yyyy-'T'HH:mm:ss][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[csv-MM/yyyy-'T'HH:mm[:ss]][DATAGEN_SEED=1699915317, INJECT_OOM] - ValueError: year 0 is out of range
FAILED ../../src/main/python/csv_test.py::test_ts_formats_round_trip[csv-MM/yyyy-'T'HH:mm][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range

This one is a sampling, other cases for this test had errors locally:

FAILED ../../src/main/python/parquet_write_test.py::test_datetime_roundtrip_with_legacy_rebase[CORRECTED-CORRECTED-INT96-Timestamp][DATAGEN_SEED=1699915317] - ValueError: year 0 is out of range
@abellina abellina added bug Something isn't working ? - Needs Triage Need team to review and classify labels Nov 14, 2023
@abellina abellina changed the title [BUG] test_ts_formats_round_trip fails with DATAGEN_SEED=1699915317 [BUG] test_ts_formats_round_trip and test_timestamp_roundtrip_no_legacy_rebase fail with DATAGEN_SEED=1699915317 Nov 14, 2023
@abellina abellina changed the title [BUG] test_ts_formats_round_trip and test_timestamp_roundtrip_no_legacy_rebase fail with DATAGEN_SEED=1699915317 [BUG] test_ts_formats_round_trip and test_datetime_roundtrip_with_legacy_rebase fail with DATAGEN_SEED=1699915317 Nov 14, 2023
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Nov 14, 2023
@ttnghia
Copy link
Collaborator

ttnghia commented Nov 15, 2023

We've observed the same issue before, as documented:

            # Spark supports times starting at
            # "0001-01-01 00:00:00.000000"
            # but it has issues if you get really close to that because it tries to do things
            # in a different format which causes roundoff, so we have to add a few days,
            # just to be sure
            start = datetime(1, 1, 3, tzinfo=tzinfo)

However, generating timestamps from datetime(1, 1, 3) is not enough. When deserializing from Spark into Python we still get data roundoff issue. For example, from test_ts_formats_round_trip:

self = TimestampType, ts = -62135596800000000

    def fromInternal(self, ts):
        if ts is not None:
            # using int to avoid precision loss in float
>           return datetime.datetime.fromtimestamp(ts // 1000000).replace(microsecond=ts % 1000000)
E           ValueError: year 0 is out of range

The corresponding timestamp that causes error is 0001-01 read from csv. For better avoiding the issue, we should start from datetime(1, 2, 1) instead. I tried that the tests pass.

@ttnghia
Copy link
Collaborator

ttnghia commented Nov 15, 2023

Will post a fix soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants