Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ElasticSearch fields are opaque to consumers #68

Open
natanlao opened this issue Sep 18, 2019 · 1 comment
Open

ElasticSearch fields are opaque to consumers #68

natanlao opened this issue Sep 18, 2019 · 1 comment

Comments

@natanlao
Copy link
Contributor

Many of the vignettes in this repository make POST /search requests to filter bundles based on their contents. For example, in the old Download SmartSeq Expression Matrix for Scanpy notebook, this ElasticSearch query was used to find recent bundles with .results files:

query = {
    "query": {
        "bool": {
            "must": [
                {
                    "match": {                                                           # It needs to
                        "files.file_json.files.content.file_core.file_format": "results" # have a
                    }                                                                    # results file...
                },
                {
                    "range": {
                        "manifest.version": {
                            "gte": "2018-07-12T100000.000000Z" # ...and preferably not be too old, either.
                        }
                    }
                }
            ]
        }
    }
}

The form of these queries is inaccessible to the researcher target audience of these vignettes. (In our case, two other DCP developers and I spent some thirty minutes trying to write a better-formed query that worked against prod, as the one above didn't. I imagine that, for an unaffiliated researcher, the process would have been much more difficult.)

There is no clear, accessible documentation that researchers can access to determine what fields they should query against in a POST /search request to filter through data programmatically.

(To underscore this point, the example query on the HumanCellAtlas/data-store README returns no results against prod.)

@chmreid
Copy link
Collaborator

chmreid commented Sep 24, 2019

This seems like something that the query service component should be able to help with

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants