Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for phantom files #3956

Closed
hannes-ucsc opened this issue Mar 18, 2022 · 3 comments
Closed

Add support for phantom files #3956

hannes-ucsc opened this issue Mar 18, 2022 · 3 comments
Assignees
Labels
demo [process] To be demonstrated at the end of the sprint demoed [process] Successfully demonstrated to team enh [type] New feature or request indexer [subject] The indexer part of Azul orange [process] Done by the Azul team

Comments

@hannes-ucsc
Copy link
Member

hannes-ucsc commented Mar 18, 2022

Currently we get

[WARNING] 2022-03-18T00:14:22.172Z 6a94fe05-6c5c-5763-9e88-a4a8bca5fee3 Worker failed to handle message {'catalog': 'lm2-it', 'action': 'add', 'notification': {'source': {'id': 'f9c9ee4f-7e16-411b-b127-781930dc04dd', 'spec': 'tdr:datarepo-df6004c2:snapshot/lungmap_prod_f899709cae2c4bb988f0131142e6c7ec__20220310_20220311_lm2:/0'}, 'transaction_id': '477c950b-6333-4bdf-a7e4-2bc37e1f690d', 'match': {'bundle_uuid': '31244706-60f8-3043-8db7-a39fd7081139', 'bundle_version': '2022-03-11T17:13:52.129998Z'}}}.
Traceback (most recent call last):
File "/var/task/azul/indexer/index_controller.py", line 174, in contribute
contributions = self.transform(catalog, notification, delete)
File "/var/task/azul/indexer/index_controller.py", line 207, in transform
bundle = service.fetch_bundle(catalog, source, bundle_uuid, bundle_version)
File "/var/task/azul/indexer/index_service.py", line 182, in fetch_bundle
return plugin.fetch_bundle(bundle_fqid)
File "/var/task/azul/plugins/repository/tdr/__init__.py", line 246, in fetch_bundle
bundle = self._emulate_bundle(bundle_fqid)
File "/var/task/azul/plugins/repository/tdr/__init__.py", line 331, in _emulate_bundle
bundle.add_entity(entity_key=f'{entity_type}_{i}.json',
File "/var/task/azul/plugins/repository/tdr/__init__.py", line 653, in add_entity
drs_path=self._parse_file_id_column(entity_row['file_id']))
File "/var/task/azul/plugins/repository/tdr/__init__.py", line 717, in _parse_file_id_column
reject(file_id is None)
File "/var/task/azul/__init__.py", line 1292, in reject
raise exception(*args)
azul.RequirementError

Phantom files are defined as files with no data file object in the staging area and file_descriptor.drs_uri set to null.

https://github.com/HumanCellAtlas/dcp2/pull/55/files
https://github.com/HumanCellAtlas/metadata-schema/pull/1437/files

They will cause the TDR file_id column to be null. We should allow that if, and only if the file_descriptor column contains JSON whose drs_uri property is also null. If the property is not set, or if it is set to a value other than null and RequirementError should be raised. An unset property is invalid, support for non-null values will be added in #3631.

Downstream from the indexer, the resulting None values in drs_path in our index need to be handled gracefully.

@hannes-ucsc hannes-ucsc added the orange [process] Done by the Azul team label Mar 18, 2022
@melainalegaspi melainalegaspi added enh [type] New feature or request indexer [subject] The indexer part of Azul labels Mar 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
demo [process] To be demonstrated at the end of the sprint demoed [process] Successfully demonstrated to team enh [type] New feature or request indexer [subject] The indexer part of Azul orange [process] Done by the Azul team
Projects
None yet
Development

No branches or pull requests

3 participants