Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collections Discussion #140

Open
cmheazel opened this issue May 11, 2020 · 194 comments
Open

Collections Discussion #140

cmheazel opened this issue May 11, 2020 · 194 comments
Labels
Collections Applicable to Collections (consider to use Part 2 instead) Part 2 Issues to be resolved prior to TC vote

Comments

@cmheazel
Copy link
Contributor

This issue attempts to pull the various /collections discussions into a single issue.

@cmheazel cmheazel added the Collections Applicable to Collections (consider to use Part 2 instead) label May 11, 2020
@dblodgett-usgs
Copy link

dblodgett-usgs commented May 21, 2020

(EDIT: 6/4/20) This issue is essentially agreed to. See: #140 (comment)

Actions here are still

  1. Work through documenting the outcome in these slides
  2. Triage the open issues and make sure we haven't missed anything.
  3. Open new issues that describe how to implement the outcome here.

Both API-Coverages and API-Environmental Data are in an awkward position pending progression of this discussion. Here's my attempt at summarizing to get us moving.

I think there are two issues at play here, let's call them data-resource and items.

The data-resource issue comes down to:

  1. Will the collections end point be a container (catalog) for flexibly typed resources that are ostensibly datasets?
  2. Will the collections end point be reserved for collections of things that are ostensibly features?

See #17, #36, #39, #45, #47, #74, #86, #99, #105, #106, #111, #116, #120, #122, #128, #130

The items issue comes down to:

  1. Will a consistent approach to metadata for sets of items (at whatever path) and an items API path literal be used?
  2. Will an approach to "sets of items" be left to each individual API?

See #45, #80, #82, #87, #83, #107, #110, #128 (and likely others)
Broadly, this wiki post by @cmheazel discusses this wiki post.

The status of addressing these issues for now:

Collections and items have been moved to its own specification part so core can move forward: http://docs.opengeospatial.org/DRAFTS/20-024.pdf

In the current (5-21-20) collections spec, the /collections path literal is used for "A body of resources that belong or are used together. An aggregate, set, or group of related resources."

The items path literal is used for: "the individual member resources that make up the collection".

Narrative:

Some are advocating for a flexible definition of collection to allow spatial data resources to be the resources that make up the collections path literal resources. Some spatial data resource types in this approach would use the items path literal but not all.

Others are advocating for strict typing of the collections path literal. Essentially saying that the things you get from a collections resource should all be the same (e.g. feature collections). I don't think there are issues with using items in this approach as long as a different path literal is used high up in the chain.

A path forward:

In follow up comments here, please take care to focus on the issues and seek to find ways to communicate unique characteristics of proposals with clarity. Please make fully-fledge proposals and try to name them such that we can discuss with clarity.

References OGC-API specifications of interest:

There is some discussion of the API-Features approach to collections in the core requirements class.

Coverages discusses their approach in collection access

EDR discusses their approach in environmental resources

Records discusses their approach in collection access

Processes does not use the collection endpoint

There may be other relevant specs (styles?) to consider, but I think these are the core that we need to consider at this juncture.

@dblodgett-usgs
Copy link

To give the group something to get started, I want to propose that we

  1. do NOT add a collections literal path to API-Common, but rather,
  2. include a /{spatialResource} at the root of an OGC API which API-Features /collections would be conformant with.
  3. Have a reusable set of conformance classes for items that could be used on any set of resources in an OGC API. API-Features items would also be conformant with this common items.

@jerstlouis
Copy link
Member

jerstlouis commented May 24, 2020

As recently discussed in #11 and #111, I believe the most controversial aspect of the Collections problem is its generic name which sows confusion, while what we are trying to define here is something better described as OGC API - Common (Geospatial data).

As such, and in line with what both @dblodgett-usgs and @jeffharrison are suggesting, let's entertain the idea that we can drop the literal 'collections' from being fixed, and that a compliant client must instead rely on finding the list of geospatial data resource by following "rel" : "data" from the landing page. As far as compatibility with OGC API - Features is concerned, this would either require that any server offering Features (which may be served along with other APIs) stick to "/collections", or that the Features standard is revised with a breaking change where clients can no longer rely on "rel" : "data" pointing to "/collections". Let's assume we are okay with this for now and continue.

Now "OGC API - Common - Part 2: Geospatial data" could say:

  • The "rel" : "data" of the "OGC API - Common - Part 1: Core" links to a URL we will call {dataResourcesURL}
  • A GET request on {dataResourcesURL} will return a list of data resources, with a schema matching what is currently returned by Features as the reponses to /collections
    (NOTE: This does imply a "collections" : property at the top of that JSON response, there's sadly no way I can think of around that.).
  • In that list of data resources, as per the OGC API - Features specifications, and the current draft OGC API - Common (Collections) specifications, two key link relations are defined inside the "collections" property array:
    • "items" -- This is where you actually retrieve your data (In Features, this links to /collections/{collectionID}/items, in Coverage, that currently links to /collections/{collectionID}/coverage/all I believe. For 3D Tiles, this could point to your tileset.json (linking to one or more node of the BVH). For i3s, this could point to your root BVH node.
    • "self" -- In Features, this links to /collections/{collectionID} where you normally retrieve the exact same content of that element for the one collection object of the "collections" array, all by itself. Same here, this links to {dataResourcesURL}/{dataResourceID} with same response.

The Tiles API could also be tied here by having a "tiles" link relation within each element of that array.

I also foresee the need for additional optional conformance classes to be able to arbitrarily retrieve data using bounding boxes and resolution, without the client having to know anything about the type of geospatial data.

@akuckartz
Copy link

akuckartz commented May 25, 2020

Maybe some ideas from https://www.w3.org/TR/ldp/#ldpc can be used?

@dblodgett-usgs
Copy link

@akuckartz Please note the "A Path Forward" section above.

Can you please flesh out what you think is of value from the Linked Data Platform Containers list?

@akuckartz
Copy link

@dblodgett-usgs That same comment ends with "There may be other relevant specs (styles?) to consider, ..." LDP is such a spec - and even a standard.

Can you please flesh out what you think is of value from the Linked Data Platform Containers list?

I will try, but can not guarantee that I find enough time.

@dblodgett-usgs
Copy link

I see -- that closing comment was meant to close out the list of OGC API Specifications that have work in progress related to the data-resource and items issues. If you think there are elements of the LDP specification that are useful in bringing closure, please present them here, but that comment was not intended to be an open-ended ask for additional concepts from outside the current baseline.

@dblodgett-usgs
Copy link

@jerstlouis -- why carry rel: data forward from Features at all? Couldn't rel: data be an OGC API Features link relation that gets you to a feature collection view of the data-distribution at / ?

Your response does not address the two aspects of the issue as posed in my top-level summary and seems to be mixing thing up.

Re: the "data-resource" issue, which seems to be the one you are addressing, you are clearly interested in collections being a container for flexibly typed resources that are datasets. But your examples just describe a design pattern -- not why it is better than another design pattern.

Your:

(NOTE: This does imply a "collections" : property at the top of that JSON response, there's sadly no way I can think of around that.).

Indicates to me that it is a bit of a hack and is probably not a solution that we want to pursue.

Specifically, what is wrong with:

  • / as a data-resource
  • /collections as a feature-collection spatial-resource
  • /coverages as a coverage spatial-resource
  • /tiles as a tiles spatial-resource
  • etc. etc.

Where API-Common would specify the common aspects of a spatial-resource but not the path semantics -- only that a given spatial-resource view over a data-resource should have a literal path for its API.

@jeffharrison
Copy link

I don't think there's anything wrong with it. There should be room in OGC API for straightforward geospatial resources too.

Best Regards,
Jeff

@cportele
Copy link
Member

In general, I see two general approaches for moving forward, if the existing Collections resource does not work for the SWGs that specify other spatial data resources.

Let me start with the proposal from @dblodgett-usgs, which I would characterise as follows:

  • Restrict an API that shares spatial data to a single dataset.
    • Note: There will also be APIs with no dataset involved, e.g. to share processing capabilities, styles, etc.
    • Note: A metadata catalogue is a dataset, too.
    • Note: An API that implements Records and that catalogs data APIs is essentially an API that provides access to multiple datasets.
  • Group spatial data resources so that they provide reliable access patterns and do not require that code does a detailed analysis of the resources to determine how to access sub-resources. Use a fixed token for the path element.
  • Each spatial data resource represents one or more distributions of the dataset (one per supported media type).
  • The same dataset may be shared via multiple spatial data resources ("views") with different access patterns. For example, the data may be shared as features, as a coverage or as vector tiles, each with their own access patterns (sub-resource structures, applicable query parameters, etc.).

That is, /collections would be restricted to cases, where the data items are accessed at /collections/{collectionId}/items with support for paging and filtering via bbox/datetime. Example: features and records. Neither /collections nor an "items" link relation type would be discussed in Common Part 2. Their specification would remain in Features.

For spatial data resources with other access characteristics, other resource types with other tokens should be defined / used in the respective specifications. For example, Tiles, Coverages, EDR.

Note that I don't see how Common Part 2 could define "a reusable set of conformance classes for items that could be used on any set of resources in an OGC API." The items resource is a specific sub-resource of a collection and other spatial data resources may have different access patterns (I assume, for example, coverages).

One aspect that would need more thought are the link relation types. In Features, the "data" link relation type references the Collections resource at /collections. The link relation type is defined as "refers to the root resource of a dataset in an API." Since we cannot register something like "data" with IANA, we need a fresh start in Common anyhow. I see two options:

  • define a link relation type for each spatial data resource type (e.g., Features could define "ogc:collections"), or
  • define something general like "ogc:spatial-data-resource" or "ogc:distributions" in Common Part 2.

The second option works, if/since we use fixed tokens like "collections" or "tiles" for the spatial data resource types. However, for clients navigating the API by following links, the first option should be easier to use.

The second approach that I see is to also move away from fixed paths like /collections in Common Part 2, but define a flexible architecture for other resource types that represent dataset distributions. This should work, too, is maybe a cleaner architecture, but also more complex to develop client code.

For simplicity, let's assume that we still restrict an API that shares spatial data to a single dataset. If this changes, the solution would become more complex.

Without fixed paths we need to rely on other mechanisms so that humans and software can understand an API, both from the API definition and from navigating the resources:

  1. For a general solution we would need a richer set of link relation types to distinguish links to different types of data resources - individual data items and aggregations of them with datasets and distributions having a special role. The downside is that we would need to develop a sound resource model and I doubt that we have enough experience yet to standardize one.

  2. Since the starting point is that not all spatial data resources are under /collections, we need to be able to link to other spatial data resources and be able to understand what they are. We probably would need to a) register a JSON media type for each resource type representation in the API and b) should include a "type" member in each JSON object (in the current Collection(s) resource types these members would have to be optional for backwards compatibility). The structure of these resources would be out-of-scope for Common Part 2, which to me also would exclude any discussions about "items".

A potential risk with all that is that the flexibility on the server side comes also at a cost for client developers. Developing a generic client that works out-of-the-box would require good knowledge about all these concepts. (NB: there is also more work for document editors / OGC as more and more IANA registrations would be the likely result.)

One of the key drivers behind the WFS 3.0 / OGC API Features activity, and I hope also behind the OGC API idea in general, was/is to reduce the learning curve and the complexity for developers compared to many of the standards from the OWS/XML stack. Yes, we also want to improve the overall architecture in the OGC baseline in this process, but we should avoid approaches that add complexity/flexibility that is not needed by the majority of the deployed APIs.

If we go down a path with a very flexible resource structure, there should be agreement that OGC API standards (e.g., Features) can remove flexibility for "their" resource types. In Features we ended up with the current structure after implementation feedback and intensive discussions (see, e.g., issues 90, 64 and others) and that approach has proven to work well for Features.

@dblodgett-usgs
Copy link

Thanks for this @cportele.

I am admittedly out on a limb with the items idea. I was thinking that the link relation for items and at least limit/paging could be reused in other places that are not under a collections API path literal?

The reason I'm leaning toward an approach where each API access pattern gets its own literal path is largely what you point to as a key driver for OGC API.

... to reduce the learning curve and the complexity for developers compared to many of the standards from the OWS/XML stack.

Your notes are really important to what I'm seeing as a path forward:

Note: A metadata catalogue is a dataset, too.
Note: An API that implements Records and that catalogs data APIs is essentially an API that provides access to multiple datasets.

In this world view, an OGC API can be cataloged in an OGC API Records and referenced as a dataset in its own right. I've used ISO19139 (services metadata) to integrate dataset services into processing workflows very successfully and see this as putting the complexity in the right place but keeping it "in band".

Where are others at on this? @cmheazel @joanma747 What would you suggest as a path forward? I am pushing here because of how much work is bound up in EDR and Coverages pending this discussion. Coverages and EDR folks: @Schpidi @pebau @chris-little @m-burgoyne where do you stand on this?

@jerstlouis
Copy link
Member

jerstlouis commented May 26, 2020

@cportele @dblodgett-usgs @cmheazel @joanma747 What I was hoping to see in OGC API - Common Part 2: Geospatial data is the following...

  1. A common mechanism (regardless of the type of geospatial data or available views) by which to list all data layers within a dataset (starting from its landing page), including common information useful to a client, using a Common schema (though it can be extended for the specific module). This includes:
  • Identifier
  • Title
  • Spatial & Temporal extent
  • Intended scale / resolution

/collections/{collectionID} in both OGC API - Features and the current draft of OGC API - Coverage satisfies this for the most part.

  1. A common mechanism by which links are provided within this schema for each of these data layers, which return the data in one or more forms in which it is being distributed. Based on the link properties ("rel", "type" and/or other properties), a client knows what it will get when it follows that link.

Currently, the "rel" : "items" of OGC API - Features, also used in the current draft of OGC API - Coverage, linking respectively to /collections/{collectionID}/items and /collections/{collectionID}/coverage/all also satisfies this.

Relations could be changed as needed, additional properties for the links could be added, but this is the functionality I hope ends up in this Common approach to Geospatial data.

  1. Then I also hope for conformance classes supporting a simple retrieval mechanism from BBOX+resolution to retrieve the data, either from that same link relation, or from a separate "rel" if necessary.
    /collections/{collectionID}/items?bbox=30,40,50,60 -- Returns me a GeoJSON for my bbox for a vector layer
    /collections/{collectionID}/coverage/all?bbox=30,40,50,60 -- Returns a CoverageJSON for my bbox for my raster layer
    /collections/{collectionID}/coverage/all?bbox=30,40,50,60&f=geotiff -- Returns a GeoTIFF for my bbox for my raster layer

Similarly, rather than BBOX+resolution, one might use the Tiles API instead in a consistent manner, for retrieving the data either as vector and/or raster. And one might use the Maps API to render that data in a consistent way, and one might refer to this data layer the same way as an input to a Process.

These are the use cases I care the most about, and so far the newer proposals seem to move away from this and I see it as a major setback in terms of having a common approach to geospatial data.

If we can resolve this, then we could discuss about how one might represent a hierarchical structure both within a single dataset, and as a way to organize multiple datasets, and whether that capability could be one and the same, or implemented in a similar manner, but that is largely a separate issue.

My rationale for wishing to have this functionality is also entirely based on reducing the learning curve and the complexity for developers. By implementing these simple capability once, clients automatically handle the generic aspect of working with any type of geospatial data, and can gradually implement additional support for the special handling or capabilities specific to a particular data type or retrieval mechanism.

As a practical example of the value of this, based on the current draft Coverage specifications, the only thing currently missing in our Features & Tiles API client from supporting Coverages is parsing CoverageJSON, because the current generic common geospatial data approach already allows it to follow the links all the way to /collections/{collectionID}/coverage/all which returns the data as CoverageJSON. Without writing any special code for Coverage, it could already see the titled coverages and their geospatial and temporal extents.

@cportele
Copy link
Member

@jerstlouis - You have lost me now. I thought you wanted to get rid of /collections as a root resource for dataset distributions, but now you seem to say that every data API should use /collections?

@jerstlouis
Copy link
Member

jerstlouis commented May 27, 2020

@cportele I never wanted to get rid of /collections, but because I thought the name collection in the path was the source of all this controversy, I suggested in a previous post that if we could figure out a way for the published Features standard to relax that 'collections' literal in /collections, and understand /collections to be wherever the landing page "rel" : "data" point to, then it might be easier to move forward. However you seem to indicate that Features would like to remain restrictive in this regard, which would at least imply that the literal 'collections' must remain if a dataset contains at least one Features data layer.

In my last post, you could substitute /collections to /roses, with the exception that currently OGC API - Features, Coverage and Common (Collections) draft all prescribe /collections at the moment.

The dataset / hierarchy discussion is separate and I was trying to avoid it until the most fundamental aspects are settled (i.e. points 1 & 2 which currently work with current draft specs). In an ideal world, I would combine the datasets/collections landing page, collections, and 'collection resource' to a single schema. Then such a resource could have links to data representations/views at the current level, links to sub-datasets and/or links to sub-collections, And you would have an indicator saying whether a particular resource constitutes a dataset per the DCAT definition. A service could have a higher up service landing page with service info, but not representing any specific datasets, linking to "data" (the root hierarchy for datasets and collections) and "processes". There could be links to api and conformance at whichever level(s) it makes sense. That root data resource being "/collections" would have been the easiest way to be compatible with Features as it is currently specified.

@cportele
Copy link
Member

@jerstlouis - I don't see Features moving away from /collections as the current approach works plus changing it would be a breaking change.

I also don't think it is the name; if the resource definition (contents, sub-resources, parameters, etc) would work for other data items, the name shouldn't be a real issue.

Also note that in the current drafts we already have dataset distributions that are not under /collections like /tiles, which has been the idea from early on.

To move forward on this issue, I think we need broader input, e.g. from those mentioned by @dblodgett-usgs.

@jerstlouis
Copy link
Member

jerstlouis commented May 27, 2020

@cportele /tiles at the root of the dataset works for tiles containing all data layers, but we also have /tiles inside each {collectionID} to retrieve tiled layers individually. Also a service may serve both the raw data tiles, or may want rendered map tiles, which should be distinguished.

I agree that the name shouldn't be a real issue, but I believe for some it is the main issue (e.g. see #11 (comment) , and contrast that with Jeff's previous comments.).

@jeffharrison
Copy link

Uhh, what I said was -> OGC shouldn't mandate the use of the term 'collections' as the identifier for all geospatial resources. But at this point in the OGC API development process it's reasonable for OGC to say the identifier of a {geospatialResource} could be "/collections/{collectionId}" or a coverage or another geospatialResource.

Best Regards,
Jeff

@jerstlouis
Copy link
Member

@jeffharrison is it the term 'collection' that you have an issue with, or the idea of a common approach to geospatial data consistent across different APIs (common way to get from a landing page to your data layers, which has e.g. a spatiotemporal extent / volume, and links to resources to access that data, e.g. features items, coverage, bounding volume hierarchy tileset for 3D data)?

In that comment I linked on issue 11 you seemed to welcome that proposal without the term 'collection'.

@dblodgett-usgs
Copy link

Thanks @cportele. We really do need input from others here. So far, most of the discussion between @jerstlouis and others has been talking past one another without some shared use cases and assumptions to root the discussion in.

I attempted to provide some focus in my opening comment: #140 (comment) and we need to focus this and iterate toward consensus rather than continue to air old arguments.

@chris-little
Copy link
Contributor

@dblodgett-usgs @cportele @jeffharrison @jerstlouis @joanma747
To be honest, I am getting lost in all this. In EDR, we support a few queries 'sampling' against a single geospatial resource. We would like a common OGC API mechanism to identify the resources that fall within the client's query's spatio-temporal bounds of interest.

I think that the OGC API Common Part 1 can do this, as can Part 2 Collections, and probably Records.

Grouping of several resources quite tightly is desirable (e.g. all the Météo-France forecasts for today at a certain resolution, both upper air and surface), as are more loosely coupled groups (e.g. all forecasts and observation datasets for NW Europe, at differing resolutions, from Latvia to Portugal, issued on 13 October 1987)

There are some use cases for compatibility with OGC API - Features collections/collectionId/items.

"Layers" do not make sense to EDR, as a single datastore resource may have 10 million "layers", each of which could be MBs or even a GB in size.

I am not sure that this gives you a clear direction.

@dblodgett-usgs
Copy link

I'm doing my best to remain neutral but also push people on the issues and try to focus this discussion. I want to bring some comments from opengeospatial/ogcapi-coverages#65 over here.

Thus far, the discussion is focusing heavily on the nature of the /collections end point and not really concerned with the less contentious issue of a consistent approach to items.


@jerstlouis offers a helpful set of benefits for treating the /collections end point as a dataset catalog in opengeospatial/ogcapi-coverages#65 (comment) providing justification for answering yes to the question I posed at the outset of this issue:

Will the collections end point be a container (catalog) for flexibly typed resources that are ostensibly datasets?

excerpting @jerstlouis:

  • We are trying to represent a specific "leaf" (most granular) data entity, regardless of its data type, at a single end-point.
    ...
  • We are also enabling to list all such leaf data entities at the same level, e.g. to list all of them part of a single dataset.
  • We are providing a generic manner by which to query the description of such an entity, e.g. its spatio-temporal extent, or to retrieve a list of these entity descriptions.

I find this very helpful for the following reason:

  1. API-Features specifies that an API is for one and only one dataset, but
  2. the only place to get spatial metadata is at the collection level.
  3. So, while Features may say it is for one dataset, it is set up to represent 1:n spatial data entities (feature-collections).
  4. If we are going to have parity in "leaf" spatial data entities (collections), then adding additional access methods for a collection is totally logical.

OK, so running with this a bit, what is a collection?

I think @pvretano offers some good words over in opengeospatial/ogcapi-coverages#65 (comment).

In my simple-minded view of coverages, I see them as a collection of measurements (samples) taken with reference to some subdivision/tessellation of some object space that is somehow geo-located.

@pvretano, your attempt at self deprecation isn't working on me. I know you are way ahead of us. ;)

I find this idea of "a collection of measurements (samples)" to be the profound bit.

In API Features, API Coverages, and API Environmental Data, we are all circling around this notion of accessing a digital representation of the world, potentially bounded to some spatial domain. In Features, the representation is entities we have identified and want to share for whatever reason. In Coverages, the representation is a tessellation that, in an ideal world, approaches the continuum it is sampling. EDR accepts (cynically?) that people don't really care about features and coverages, and just want to ask what the dataset's estimate of the value of the real-world is for a location, point, area, trajectory, etc.

So is that what a collection is? A spatially bounded collection of samples of a real world phenomena that (depending on the nature of the samples) can be accessed via a variety of APIs?


One other interesting comment before I call out some others and look for a way forward.

@tomkralidis says:

  • whether we go with /collections, or /coverages, /tiles etc., have the respective collectionInfo.yaml inherit from a generic collection content model (from what would become Common Part 2). Or maybe even an OGC API - Records record model?

I want to call attention to: "Or maybe even an OGC API - Records record model? because, if we are going to go down this road, we must define the relationship (it can be flexible) between a collections and datasets that are going to show up in API Records. Elsewhere in @tomkralidis' comment, he points out that "this would also help servers provide "on board" catalogues of the data they serve pretty easily". The question that might get people thinking is: "Is there a cross walk between collection metadata and DCAT?!?"


Now -- let's assume we go with /collections and use rel: data to get you there.

How do we fix the issue that you have to parse a bunch of garbage you might not care about and find the stuff you do care about / have client code to deal with? @jyutzler described it over here: #47 (comment) Some have suggested a "collectionType" enum but that's gotten quick push back with counter suggestion of an "accessTypes" array. but I don't think that goes quite far enough.

There is a strong desire to minimize the diversity of functionality that exists at a given API path. Is there a middle-way here? Can we define common collection info that sets us up to allow diversity without introducing undue complexity when implementing general client code?

How do we bridge the gap between the advanced geospatial perspective where we have these abstract hierarchical datasets made up of collections with varied access patterns and a non-geospatial web developer who just needs to get their client or server code to work and be conformant?

I want to suggest that the path forward

  1. must be based on simple, use-case-oriented, building blocks that happen to fit together into a coherent (and complex) whole,
  2. must include tight and simple definitions of things like collections that provide clarity rather than convey generality.
  3. must have a clear architecture for dataset-catalogs, data-distribution, processing, and integrated representations (maps etc.) (or whatever this taxonomy should actually be).

If we can define the initial building blocks in the APIs that are in motion (including Common), get our shared definitions right, and define this architecture in common, I think we can move forward. But we must stop talking past each other and seeking to understand other's requirements and find common ground.

At this point, I'm curious where @jyutzler and @cmheazel are at on the issues.

@KathiSchleidt
Copy link

Hi Dave,

many thanks for having done the painful work of collecting all these insights into the collection conundrum!!!

what I'm seeing is two worlds colliding:

  1. the spatial world, focus is the geometry, a few attributes added for context. Also seems to have a fairly clear dataset concept, aligned with a layer, aligned with a velum overlay as a map layer

  2. the data world, focus on the data, a bit of geometry added for context. No clear dataset concept as this is in the eye of the beholder, whatever bits make sense in a thematic context.

Trying to force data from the 2nd world into the simple clean concepts stemming from the first doesn't seem to be working, the reason we have our SensorThings (STA) and to my understanding the background of EDR.

  • In STA, we at least have a real-world-object to ground our observations (I'm just missing how to link our observational concepts to these items).
  • In EDR, the spatial object the resulting dataset pertains to doesn't exist until the EDR query has been sent.

Taking this a step further, I see many cases where the provision of the spatial (1) vs. data (2) aspects are performed by different organizations or institutions, thus firming up the requirements on being able to link data on a spatial object (or area to also support EDR) from one source with spatial information from a different source.
Related to this is the requirement to 'represent a specific "leaf" (most granular) data entity, regardless of its data type, at a single end-point.' While this sounds very good, to my view (S)ELFIE has put up clear requirements to the contrary, at least when it gets to a real-world-object.
We're also encountering issues when using multiple OGC Standards, what is the 'single end-point' for a data object being provided by both OAF and STA?

Sorry, no solutions, just the concern that by ignoring the dichotomies engendered by the 2 worlds described above, we will continue to come up short of real world requirements.
The modern spatial world requires more information on spatial objects than can be provided by an integrated set of attributes!

My 2 cents

:)

Kathi

@jerstlouis
Copy link
Member

jerstlouis commented May 31, 2020

@KathiSchleidt @dblodgett-usgs

In an attempt to bridge these two worlds, I would like to clarify what I meant by this "leaf (most granular) data entity" concept.

Leaf / most-granular might have been an overstatement, as e.g. you could split a FeatureCollection into individual features, polygons, points. Similarly you could split a coverage in its individual grid cells or samples.
In the context of IoT / sensors, each invidual sensor may provide one or more measurements/observations, and the sensor itself is positioned at a point in space, and the measurement/observation is captured at a given point in time.

So what I was picturing as the "leaf data layer" in the case of sensor data, is not the individual sensor or its measurements, but a collection of mutiple sensors, along with their geospatial and temporal aspects.

Potentially, a single SensorThings API could be the source of one or more such data layers, or multiple SensorThings API could be sourced to provide one or more integrated "leaf data layer(s)" (e.g. based on the thematic context). Each of these data layer could then additionally be offered as either or both Feature Collections and Coverages, to facilitate the use of this information in GIS tools without built-in support for SensorThings API. When one SensorThings API maps directly to one such data layers, or when describing the SensorThings API itself, the spatio-temporal extent for it would be the overall extent of the temporal and geospatial coordinates for all measurements provided by that API.

@KathiSchleidt
Copy link

@jerstlouis thanks for this clarification!
following up - how would you see the various classes of STA? All as one collection (so a wild mix of Things, Sensors, ObservableProperties...) or as a collection per class type? (this is where I always get lost).
Adding a Sensor Collection, especially if this already includes "their geospatial and temporal aspects", starts out seeming pretty straightforward. The tricky bits show up a bit later:

  • How do you deal with moving sensors?
  • Are ObservableProperties a collection on their own, or do they get denormalized into their sensors?
  • ???

I'd much appreciate a simple sketch of how to bring this fairly simple STA world into OAF

@jerstlouis
Copy link
Member

jerstlouis commented May 31, 2020

@KathiSchleidt
I believe an overall SensorThings API would best map to a single collection, at least based on my limited familiarity with STA so far and your brief description (and a glance at https://docs.opengeospatial.org/is/15-078r6/15-078r6.html#24). It would also be possible for a SensorThings API to map to multiple collections, but each of these collections would likely map to some thematic regrouping of sensors along with their associated Things, ObservableProperties, etc., rather than each of those aspects of the SensorThings APIs being separate collections. But each of these collections could also stand on their own as individual SensorThings APIs.

By class type, am I correct in understanding that you were referring to SensorThings conformance classes? In other OGC API specifications, such as Features and Coverage, conformance classes describe different capabilities of the API, which applies to the multiple available collections.

Moving sensors -- each set of observation is taken at certain time, and the difference with non-moving sensors is that the geospatial coordinates changes along with the time. A collection of sensor measurements/observations still has an overall spatio-temporal extent.
The Moving Features standard should also be considered, and it would be interesting to see how this can integrate with the Features API.

I don't think Observable properties would be a collection on their own.
e.g. if one was to create a feature collection out of information coming from a SensorThings API,
observable properties become the associated data attributes (properties), while the geospatial coordinates of the sensor become geometry points, and the time of the observation becomes the temporal aspect (which may also be stored as a property of the point feature).

If one creates a coverage out of information coming from SensorThings API, then again the sensor position becomes the coordinates of the coverage sample, the measurement/observation is the value (sample) at that position, while time is an additional dimension of the coverage, and separate types of measurements can either be represented on separate planes (extra dimension?) or by splitting it into separate coverages.

So the idea of how to regroup this sensor information would be to have the possibility to present this dataset of observations/measurements, which could potentially be retrieved using one or more SensorThings API, as one or more features collection, and/or as one or more coverages.

I don't really believe that these worlds are that far apart, because people have been building GIS vector and raster datasets from measurements and observations for a long time. The only difference with SensorThings API and the IoT is a lot more information is available and it is real-time. But I don't think this prevents the representation of the information as classic Features collections and/or Coverages. However it presents some additional challenges due to that greater quantity and flow of information, and I think space partitioning mechanisms and dynamic distributed processing are key tools to solve those challenges.

@jerstlouis
Copy link
Member

It would be good to hear @liangsteve's and @sarasaeedi 's perspective on the above :)

@cportele
Copy link
Member

@jerstlouis - I use "OGC API" exclusively for item 1, everything else is not "an OGC API" (unless we change the OGC API concept). I agree, it would help, if the language would be used more precisely, but I think that ship has to some extent sailed...

My terminology comment was mainly about the interpretation of the term "sub-resource distribution", which for me is a subset of a distribution (which by definition is always related to a dataset), i.e. the part of a distribution that is related to a sub-resource of a dataset, but this is not what the Collection.adoc page currently says.

@dblodgett-usgs
Copy link

@cportele "sub-resource" is almost certainly the wrong word. I used it without a thorough understanding of how it is used in various contexts.

What I meant was along the lines of:

We have the resource /collections/{collectionID}/ and it has a "sub-resource" /collections/{collectionID}/[****].

It appears that "nested resource" is also used commonly. I'm sure we could make that aspect of the definition more precise -- help?

@cmheazel
Copy link
Contributor Author

Resolved through PR 149

@cmheazel
Copy link
Contributor Author

Motion: The SWG moves that this issue has been resolved by Pull Request 149 and can be closed.
Motion: @cportele
Second: @joanma747
NOTUC:

@cmheazel
Copy link
Contributor Author

Definition used for Collection in overview (section 7.1) - replace with definition 'A geospatial resource that may be available as one or more sub-resource distributions that conform to one or more OGC API standards.'

@cmheazel cmheazel reopened this Sep 27, 2021
@jerstlouis
Copy link
Member

jerstlouis commented Sep 27, 2021

@cmheazel PR #149 that closed this issue had actually added that definition:

https://github.com/opengeospatial/ogcapi-common/pull/149/files

I cannot find it anywhere in the latest draft however:

http://docs.opengeospatial.org/DRAFTS/20-024.html

It was actually changes to best_practice/Collections.adoc which ended up moved to the User's Guide rather than the Part 2 specification, however.

That addition was also unambiguously clear about an OGC API collection being a collection of data:

An OGC API Collection may only contain data. It does not encompass non-data resources (but the Collection resource may still have related auxiliary resources, such as schemas, or processes, as sub-resources).

Therefore I would suggest that the SWG considers updating the definition to something that includes the term data like:

A resource consisting of geospatial data that may be available as one or more sub-resource distributions that conform to one or more OGC API standards.

@cmheazel
Copy link
Contributor Author

Updated definition of collection in section 7.1 to 'A geospatial resource that may be available as one or more sub-resource distributions that conform to one or more OGC API standards.' I'm reluctant to restrict this definition any more than is absolutely necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Collections Applicable to Collections (consider to use Part 2 instead) Part 2 Issues to be resolved prior to TC vote
Projects
Development

No branches or pull requests