Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Cast String to TimeStamp issues #716

Closed
tgravescs opened this issue Sep 10, 2020 · 6 comments · Fixed by #1718
Closed

[BUG] Cast String to TimeStamp issues #716

tgravescs opened this issue Sep 10, 2020 · 6 comments · Fixed by #1718
Assignees
Labels
bug Something isn't working P1 Nice to have for release performance A performance related task/issue

Comments

@tgravescs
Copy link
Collaborator

Describe the bug

We recently discovered that the config to turn off the string to timestamp cast was not being used. We fixed that with #705.

But while investigating that it seemed like a few things might be off. the format we are using and is used in CUDF is "%Y-%m-%dT%H:%M:%SZ%f". Based on what we support it seems like the ms section should be %SZ.%f. The pattern that we didn't match on but worked on the CPU was just: 2017-11-29 20:00:35. Ideally we would support this.

I think the regex's in this case are a pretty high performance overhead as well though so perhaps we should figure out different way to handle.

@tgravescs tgravescs added bug Something isn't working ? - Needs Triage Need team to review and classify labels Sep 10, 2020
@sameerz sameerz added P1 Nice to have for release performance A performance related task/issue and removed ? - Needs Triage Need team to review and classify labels Sep 15, 2020
@tgravescs
Copy link
Collaborator Author

Note I have seen other formats for strings - such as 2020-09-07T01:05:57.840+0000
that we also don't support.

@sameerz
Copy link
Collaborator

sameerz commented Oct 20, 2020

This is related to issue #987

@andygrove andygrove self-assigned this Feb 12, 2021
@andygrove
Copy link
Contributor

This is also related to #1117 which I am currently working on

@andygrove
Copy link
Contributor

Note I have seen other formats for strings - such as 2020-09-07T01:05:57.840+0000
that we also don't support.

@tgravescs The main issue raised in this issue (using the wrong cuDF timestamp formats) was resolved by #1718 and there is now a separate issue for reducing the regex overhead (#1738).

We do not support the format 2020-09-07T01:05:57.840+0000 and this is documented. Should we file a separate issue for adding support?

@tgravescs
Copy link
Collaborator Author

yes I think we should file one to say supported more formats

@andygrove
Copy link
Contributor

I filed #1748 for supporting additional formats.

tgravescs pushed a commit to tgravescs/spark-rapids that referenced this issue Nov 30, 2023
…IDIA#716)

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P1 Nice to have for release performance A performance related task/issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants