Skip to content

Commit

Permalink
Skip annotations for fields that do not exist in record
Browse files Browse the repository at this point in the history
I don't think we should support adding new fields with record
annotations because it's hard to ensure that the new field gets added
across all records that are streamed through the command.

This is a deviation from the original nextstrain/ingest script, but I
think a necessary one to avoid bugs in the final output metadata.
  • Loading branch information
joverlee521 committed Jun 28, 2024
1 parent 5f533eb commit 8ee9d82
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 14 deletions.
8 changes: 7 additions & 1 deletion augur/curate/apply_record_annotations.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,12 @@ def run(args, records):
if record_id is None:
raise AugurError(f"ID field {args.id_field!r} does not exist in record")

record.update(annotations.get(record_id, {}))
record_annotations = annotations.get(record_id, {})
for field in list(record_annotations.keys()):
if field not in record:
print_err(f"WARNING: Skipping annotation for field {field!r} that does not exist in record")
del record_annotations[field]

record.update(record_annotations)

yield record
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ Setup

$ export AUGUR="${AUGUR:-$TESTDIR/../../../../../bin/augur}"

Test that annotations for new fields are added to the record.
Test that annotations for new fields that are not in the record results in warning message.

$ cat >annotations.tsv <<~~
> record_2 new_field annotation_1
Expand All @@ -16,16 +16,6 @@ Test that annotations for new fields are added to the record.
$ cat records.ndjson \
> | ${AUGUR} curate apply-record-annotations \
> --annotations annotations.tsv
WARNING: Skipping annotation for field 'new_field' that does not exist in record
{"accession": "record_1", "field_1": "value_1"}
{"accession": "record_2", "field_1": "value_1", "new_field": "annotation_1"}


$ cat records.ndjson \
> | ${AUGUR} curate apply-record-annotations \
> --annotations annotations.tsv \
> --output-metadata metadata.tsv

$ cat metadata.tsv
accession\tfield_1 (esc)
record_1\tvalue_1 (esc)
record_2\tvalue_1 (esc)
{"accession": "record_2", "field_1": "value_1"}

0 comments on commit 8ee9d82

Please sign in to comment.