Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parsing of row_per_reading format with malformed dates #3967

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

ldodds
Copy link
Collaborator

@ldodds ldodds commented Oct 9, 2024

Half-hourly data can be labelled either at the start or the end of the half-hourly period. E.g. the usage between 1.30am and 2.00am might be labelled as "01:30" or "02:00".

For CSV/Spreadsheet formats where there is a single row per day there is generally no ambiguity as we parse the 48 HH columns in the order presented.

But for row_per_reading formats there's some ambiguity. Without more context a column labelled as "8 Oct 2024 00:00" may refer to the usage from 23:30 to midnight on 7th October, or the usage from midnight to 00:30am on the 8th October.

When processing timestamps, our SingleReadConverter has been written with the assumption that fields are labelled at the end of the half hour and that "8 Oct 2024 00:00" refers to the final half-hourly period of the 7th October. Data is then shifted.

Reviewing EDF field shows that they label their periods at the start of the half-hour (which is also our internal default). This means we're incorrectly interpreting the timestamps.

This PR fixes this by:

  • doing a fairly extensive rework of the supporting tests to remove a lot of boilerplate and expose the error (the behaviour was previously being tested as OK)
  • adding a new flag to AmrDataFeedConfig to indicate how the half-hourly periods are labelled. This is only expected to be populated for row_per_reading formats
  • adding some additional constraints to AmrDataFeedConfig to reduce likelihood of incorrect configs being created (although this hasn't happened yet)
  • reworking the SingleReadConverter to use this configuration to fix the issue, along with some refactoring to better expose the different ways in which we derive the half-hourly index (e.g. from an index in the original file, a time column or via a timestamp)
  • updating the EDF config to use the new setting
  • some additional rework of the parsing of time stamps for other formats to simplify code and fix spec that was failing locally but not on github

@ldodds ldodds changed the title Rework specs to expose error Fix parsing of row_per_reading format with malformed dates Oct 9, 2024
@ldodds ldodds requested a review from tbhi October 11, 2024 16:49
@ldodds ldodds marked this pull request as ready for review October 11, 2024 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant