Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to determine expected metadata content in Matrix Service loom files #381

Open
jlchang opened this issue Aug 23, 2019 · 2 comments
Open

Comments

@jlchang
Copy link

jlchang commented Aug 23, 2019

To programmatically accept Matrix Service loom files, it would be very helpful to know the set of metadata that can currently be expected in a loom file. In particular, it would be helpful to know:

  1. if there is a "core set" of metadata that should always be expected in a Matrix Service loom file (is there a document or other resource that has a list of the expected metadata?)
  2. if there are optional, project-specific metadata that might also be in a Matrix Service loom file

Additionally, will Matrix Service loom files have information on when the "core set of expected metadata" might change, how to monitor for such changes and how to determine whether such changes are expected to be breaking changes.

For instance, if a 2018 loom file had a global attribute Matrix_Service_metadata_reference_set = v8.1.1
and Feb 2019 Matrix_Service_metadata_reference_set = v8.2.1
while today Matrix_Service_metadata_reference_set = v9.2.1

I would know code from 2018 could handle loom from 2018 and loom from Feb 2019 but not today's loom file.

What are current options for knowing the set of metadata that can be expected in a given Matrix service loom file?

@jlchang
Copy link
Author

jlchang commented Sep 18, 2019

@theathorn For portals to programmatically handle Matrix Service data successfully, there needs to be an understanding of what metadata is expected in the Matrix Service loom file. Who should I work with for answers to the above question about "core" and "optional" metadata and how to determine when the set of "core" metadata has changed?

Of particular interest are change(s) in expected metadata that may result from adoption of RFC: Representing sequencing library preparations in the HCA DCP metadata standard.

@brianraymor
Copy link

@jlchang - you're welcome to directly work with the matrix service team. #the-matrix-reloaded is our HCA slack channel. In Q3, our priorities are support for mouse data and performance improvements. We're planning to address this issue during Q4.

Currently, only minimal documentation is returned by the V1 API where the data browser (or developer) could dynamically request a "README" for a file format. For example,

https://matrix.data.humancellatlas.org/v1/formats/loom

HCA Matrix Service Loom Output

The Loom-formatted output from the matrix service contains a Loom file with the cells and metadata fields specified in the query. The Loom format is documented more fully, along with code samples, here.

Per Loom conventions, columns in the loom-formatted expression matrix represent cells, and rows represent genes. The column and row attributes follow Loom conventions where applicable as well: CellID uniquely identifies a cell, Gene is a gene name, and Accession is an ensembl gene id.

Descriptions of the remaining metadata fields are available at the HCA Data Browser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants