Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix YAML date parsing #2549

Merged
merged 4 commits into from
Jun 27, 2024
Merged

Fix YAML date parsing #2549

merged 4 commits into from
Jun 27, 2024

Conversation

kytta
Copy link
Contributor

@kytta kytta commented Jun 25, 2024

This PR makes a few changes to how date is parsed in YAML preambles. Fixes #2538.

Description

New regex

Initially, I planned to address #2538 by adding the fraction separator (.) to the optional group so that it also becomes optional. However, I have made a few more changes:

  • named groups: instead of indexed access, one can access date components via their name
  • Parsing is now a bit off-spec
    • when one reads the timestamp working draft, one can see that they use two different parts for dates with and without time and tz
    • the former (just date) does not allow for short months and days (i.e. without leading zero), and is thus compliant with ISO 8601
    • the latter (date, time, and optional offset) allow months, days, and hours without leading zeroes. This is non-compliant, but user-friendly
    • I find it weird that both variants are present and have made the choice to allow months and days without leading zeroes in both 'short' and 'long' forms
  • as a side effect, there is no group duplication any more, and the regex is now shorter
  • I also make use of non-capturing groups to split the regex more cleanly

You can explore the regex on regex101: h5wyK4; and RegExr: 824pi

Fixes offset parsing

As far as I can understand from code, the parsing of the offset was done incorrectly:

  • to convert to nanosecond, seconds were multiplied by 100 million (instead of 1 billion)
  • instead of a floating point number, an integer was multiplied
  • the integer was parsed from the decimal part without the dot

I have revised the code and arrived at this solution:

  • the regex allows up to 9 decimal points in the fraction part
  • the fraction part get parsed as f64 number. It is between 0 and 1 (exclusive)
  • this floating point number represents a fraction of a second
  • this floating point number gets multiplied by one billion (nanoseconds in a second) and applied to the offset

New tests

I have re-organized some tests and have added some more tests. I've taken the new test cases from the TOML test suite, but have only taken those that make sense with the YAML spec

Sanity check:

  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

Code changes

(Delete or ignore this section for documentation changes)

  • Are you doing the PR on the next branch?

If the change is a new feature or adding to/changing an existing one:

  • Have you created/updated the relevant documentation page(s)?

@kytta
Copy link
Contributor Author

kytta commented Jun 25, 2024

Ooops; I've hoped that [skip ci] would work for Azure 😅

Copy link
Contributor Author

@kytta kytta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't mind me, it's easier for me to track what I need to do when I return to this

components/utils/src/de.rs Outdated Show resolved Hide resolved
components/utils/src/de.rs Show resolved Hide resolved
This commit does a few changes:

- Introduce a new regex
  - it is a bit off-spec (it allows one-digit months and days in date-only mode)
  - uses named groups
  - avoids group duplication
- parses offset once

Fixes #2538
@kytta kytta changed the title WIP: Refine YAML date regex [skip ci] Fix YAML date parsing Jun 26, 2024
@kytta kytta marked this pull request as ready for review June 26, 2024 13:53
@kytta
Copy link
Contributor Author

kytta commented Jun 26, 2024

@Keats I've fixed the fraction separator being required and have further optimized the regex. I've added some tests (test cases courtesy of TOML); if you don't mind them overlapping, I think it's a good test suite for this functionality 😄

@Keats Keats merged commit 211db9f into getzola:next Jun 27, 2024
5 checks passed
@Keats
Copy link
Collaborator

Keats commented Jun 27, 2024

Neat, thanks

reujab pushed a commit to reujab/zola that referenced this pull request Aug 14, 2024
* Refine YAML date regex

This commit does a few changes:

- Introduce a new regex
  - it is a bit off-spec (it allows one-digit months and days in date-only mode)
  - uses named groups
  - avoids group duplication
- parses offset once

Fixes getzola#2538

* Fix nanosecond parsing

* Rename variables for brewity

* Add tests
Keats pushed a commit that referenced this pull request Aug 15, 2024
* Refine YAML date regex

This commit does a few changes:

- Introduce a new regex
  - it is a bit off-spec (it allows one-digit months and days in date-only mode)
  - uses named groups
  - avoids group duplication
- parses offset once

Fixes #2538

* Fix nanosecond parsing

* Rename variables for brewity

* Add tests
berdandy pushed a commit to berdandy/azola that referenced this pull request Sep 17, 2024
* Refine YAML date regex

This commit does a few changes:

- Introduce a new regex
  - it is a bit off-spec (it allows one-digit months and days in date-only mode)
  - uses named groups
  - avoids group duplication
- parses offset once

Fixes getzola#2538

* Fix nanosecond parsing

* Rename variables for brewity

* Add tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants