-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new timestamp formats and simplify timestamp key handling in clp-s: #262
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Just did the first pass. One thing I'm little bit concerned about is if we mix timestamps in milliseconds with timestamp in nanoseconds, we can get wrong results during search. For example, if we have two log messages {"timestamp": "1707026480000"}
and {"timestamp": "1707026455.123456789"}
, and the query is timestamp > date(1707026480000)
, we should return nothing but actually we will return the second log message.
Right this is a good point. I modified how we parse literals supplied inside of The core of the problem is that our various timestamp formats already implicitly have different levels of precision (second, millisecond, nanosecond) but we haven't really reconciled these different precisions at the storage, search, or archive metadata levels. We sort of deal with this problem halfway in TimestampPattern as timestamps which specify second-level precision get stored with millisecond precision, but the same can't be said for raw integer or floating point timestamps. One way to handle this might be to transform all timestamps into nanosecond precision timestamps no matter how they're parsed -- this would make storage, search, and archive metadata consistent in an easy to reason about way. The downside is that this creates an encoding challenge for raw int and float timestamp columns if we want to decompress in a 1:1 way. The other issue is that "store everything as nanosecond precision" is a bit loaded because it might sometimes be ambiguous/hard to figure out what the precision of some timestamp is supposed to be, so there's still a need to let users disambiguate via configuration for a robust solution (granted for most timestamps it should be pretty obvious). Anyway if we do something like this (which we should at some point) I'm not sure if we want to include it in this PR (even though its slowly becoming the fix-every-timestamp-issue-we-notice PR). |
Co-authored-by: wraymo <37269683+wraymo@users.noreply.github.com>
And can we delete |
Not yet because we still allow raw floating point columns to be timestamps. |
Oh, that's right |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deferring to @wraymo for the validity of the clp-s timestamp-related changes. Added some comments about the new formats added and minor style changes.
return false; | ||
} | ||
auto dot_position = line.find('.'); | ||
auto nanosecond_start = dot_position + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nanoseconds_begin_pos
?
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit msg: Add new timestamp formats and simplify timestamp column compression in clp-s (fixes #261).
?
Although we should include some text about what was the fix for #261.
References
Fixes #261
Description
Validation performed