Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support non-null DRS URIs in file descriptors #3631

Open
theathorn opened this issue Nov 9, 2021 · 14 comments
Open

Support non-null DRS URIs in file descriptors #3631

theathorn opened this issue Nov 9, 2021 · 14 comments
Assignees
Labels
enh [type] New feature or request indexer [subject] The indexer part of Azul orange [process] Done by the Azul team service [subject] The service part of Azul

Comments

@theathorn
Copy link

theathorn commented Nov 9, 2021

Needed for LungMAP managed access data files stored in BDCat.
Pre-requisite is a metadata schema change to support DRS URIs - to be proposed by Andrew Herbst (Broad).

@hannes-ucsc:

Support for null DRS URIs in file descriptors ("phantom files") was added in #3956. For files with a non-null DRS URIs in the file descriptor, the /repository/files endpoint and its /fetch/repository/files cousin should return a 501 Not Implemented status code.

The /index/files response should have hits.*.files.*.url set to null for these files (just as for phantom files). Any manifest columns (fields) that would normally refer to a /repository/files or /fetch/repository/files endpoint must also be empty (null). We should add a hits.*.files.*.drs_uri to /index/files responses. All of the above for /index/projects and hits.*.projects.*.contributedAnalyses/contributorMatrices as well.

I think we should implement this by renaming the drs_path index field to drs_uri. We can pretend that DRS URI paths are still DRS URIs, just relative ones. A relative DRS URI is converted to an absolute one by the service, just as before, while absolute DRS URIs are passed through."

@github-actions github-actions bot added the orange [process] Done by the Azul team label Nov 9, 2021
@theathorn theathorn added code [subject] Production code enh [type] New feature or request indexer [subject] The indexer part of Azul service [subject] The service part of Azul and removed code [subject] Production code labels Nov 10, 2021
@hannes-ucsc hannes-ucsc changed the title Support DRS URIs in file descriptors Support non-null DRS URIs in file descriptors Mar 18, 2022
nadove-ucsc added a commit that referenced this issue Apr 5, 2022
amarjandu pushed a commit that referenced this issue Apr 14, 2022
@melainalegaspi
Copy link

@hannes-ucsc to move the PR along.

@hannes-ucsc
Copy link
Member

Updated list of reviewers on HumanCellAtlas/dcp2#55 and notified straggler on Slack.

@hannes-ucsc hannes-ucsc removed their assignment Oct 14, 2022
@theathorn
Copy link
Author

@hannes-ucsc to merge PR on 10/20/22 unless Amnon request changes by then.

@hannes-ucsc
Copy link
Member

HumanCellAtlas/dcp2#55 was merged.

@hannes-ucsc hannes-ucsc removed their assignment Oct 21, 2022
@theathorn theathorn self-assigned this Oct 25, 2022
@theathorn
Copy link
Author

Follow up with CCHMC for timeline for new LungMAP snapshots with these DRS URIs.

hannes-ucsc pushed a commit that referenced this issue Nov 17, 2022
@theathorn theathorn assigned bvizzier-ucsc and unassigned theathorn Mar 16, 2023
@bvizzier-ucsc
Copy link

LungMAP is working on a new release with updates to the existing datasets.

@hannes-ucsc
Copy link
Member

hannes-ucsc commented Dec 21, 2023

Apparently, lm4 now uses non-null DRS URIs in file descriptors but we weren't informed and didn't implement this ahead of time. We had to back out the addition of lm4 which was slated to be promoted to prod today.

@hannes-ucsc
Copy link
Member

hannes-ucsc commented Jan 3, 2024

Commit 58556e0 for #5824 disables the null requirement and converts it into a warning. Before we can implement this, we need schema PR HumanCellAtlas/metadata-schema#1437 to be approved and merged.

@hannes-ucsc
Copy link
Member

For an example of what one of these external DRS URIs looks like see #5769 (comment)

@hannes-ucsc
Copy link
Member

Assignee to work with LungMap team to restart progress on the schema PR.

@bvizzier-ucsc
Copy link

This should probably be moved to backlog.
At this time, there are no plans to allow for data download via the Data Explorer of the data located in BDC.

@achave11-ucsc
Copy link
Member

Assignee to consider next steps.

@hannes-ucsc
Copy link
Member

The workaround (58556e0 causes these external DRS URIs to be ignored by Azul and omitted from the index. There really isn't a point of putting them into the metadata if Azul simply ignores them. These DRS URIs are compact, e.g. drs://dg.4503:dg.4503/6282d0a2-732a-4949-a35d-e822581a705e. The prefix dg.4503 appears not to be registered in identifiers.org.

In order to determine how much work we put into this, it would be good to know, what the expected user journey is for these files, according to the LungMAP project leadership. It's fine to not support direct downloads for them, but surely we should at least include their DRS URIs in the manifest.

@hannes-ucsc hannes-ucsc removed their assignment Jan 12, 2024
@achave11-ucsc
Copy link
Member

Assignee to discuss with LungMAP leadership.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enh [type] New feature or request indexer [subject] The indexer part of Azul orange [process] Done by the Azul team service [subject] The service part of Azul
Projects
None yet
Development

No branches or pull requests

5 participants