Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc values support for geo shapes. #37206

Closed
imotov opened this issue Jan 7, 2019 · 8 comments
Closed

Doc values support for geo shapes. #37206

imotov opened this issue Jan 7, 2019 · 8 comments
Assignees
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes >enhancement Meta release highlight Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v7.8.0 v8.0.0-alpha1

Comments

@imotov
Copy link
Contributor

imotov commented Jan 7, 2019

There are several features that we would like to add that require fast retrieval of geo shapes and can greatly benefit from having docvalues support for geo shapes.

This issue has gone through a lot of evolution, so this description is edited to reflect
the latest reality

The work for adding geo_shape support for geo_centroid, geo_bounds, geotile_grid, geohash_grid aggregations was accomplished in x-pack by way of the following PRs:

This is achieved by implementing by migrating geo_shape mapper registration
to a module (#53562) and then defining the geo_shape with doc values in the spatial plugin (#55037)

@imotov imotov added >enhancement :Analytics/Geo Indexing, search aggregations of geo points and shapes labels Jan 7, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo

@babadofar
Copy link

What is the status on this? I really need this for my maps. Anything I could help you out with? the @elastic/es-analytics-geo repo is not public, I guess.

@polyfractal
Copy link
Contributor

No concrete status update right now. We're still discussing how to go about encoding shapes into doc values (there are several limitations/considerations that we need to work around). When we have some news moving forward -- a plan or something similar -- we'll make an update here :)

the elastic/es-analytics-geo repo is not public, I guess.

The @elastic/es-analytics-geo team is just an alias that we use to ping internal email lists, so that different sub-teams can get notified about new issues in their area. There's no additional repo or anything like that :)

@thomasneirynck
Copy link
Contributor

thomasneirynck commented Mar 19, 2019

This would be a prerequisite for enabling layer-fitting in the Kibana Maps-app. This is now only possible for document-layers backed by a geo_point field. cf. elastic/kibana#33509

@talevy talevy added the Meta label Apr 11, 2019
@colings86 colings86 added the 7x label Apr 12, 2019
imotov added a commit that referenced this issue May 14, 2019
Adds an initial limited implementations of geo features to SQL. This implementation is based on the [OpenGIS® Implementation Standard for Geographic information - Simple feature access](http://www.opengeospatial.org/standards/sfs), which is the current standard for GIS system implementation. This effort is concentrate on SQL option AKA ISO 19125-2. 

## Queries that are supported as a result of this initial implementation

###  Metadata commands

- `DESCRIBE table`  - returns the correct column types `GEOMETRY` for geo shapes and geo points.
- `SHOW FUNCTIONS` - returns a list that includes supported `ST_` functions
- `SYS TYPES` and `SYS COLUMNS` display correct types `GEO_SHAPE` and `GEO_POINT` for geo shapes and geo points accordingly. 

### Returning geoshapes and geopoints from elasticsearch

- `SELECT geom FROM table` - returns the geoshapes and geo_points as libs/geo objects in JDBC or as WKT strings in console.
- `SELECT ST_AsWKT(geom) FROM table;` and `SELECT ST_AsText(geom) FROM table;`- returns the geoshapes ang geopoints in their WKT representation;

### Using geopoints to elasticsearch

- The following functions will be supported for geopoints in queries, sorting and aggregations: `ST_GeomFromText`, `ST_X`, `ST_Y`, `ST_Z`, `ST_GeometryType`, and `ST_Distance`. In most cases when used in queries, sorting and aggregations, these function are translated into script. These functions can be used in the SELECT clause for both geopoints and geoshapes. 
- `SELECT * FROM table WHERE ST_Distance(ST_GeomFromText(POINT(1 2), point) < 10;` - returns all records for which `point` is located within 10m from the `POINT(1 2)`. In this case the WHERE clause is translated into a range query.

## Limitations:

Geoshapes cannot be used in queries, sorting and aggregations as part of this initial effort. In order to fully take advantage of geoshapes we would need to have access to geoshape doc values, which is coming in #37206. `ST_Z` cannot be used on geopoints in queries, sorting and aggregations since we don't store altitude in geo_point doc values.

Relates to #29872
imotov added a commit to imotov/elasticsearch that referenced this issue May 14, 2019
Adds an initial limited implementations of geo features to SQL. This implementation is based on the [OpenGIS® Implementation Standard for Geographic information - Simple feature access](http://www.opengeospatial.org/standards/sfs), which is the current standard for GIS system implementation. This effort is concentrate on SQL option AKA ISO 19125-2.

- `DESCRIBE table`  - returns the correct column types `GEOMETRY` for geo shapes and geo points.
- `SHOW FUNCTIONS` - returns a list that includes supported `ST_` functions
- `SYS TYPES` and `SYS COLUMNS` display correct types `GEO_SHAPE` and `GEO_POINT` for geo shapes and geo points accordingly.

- `SELECT geom FROM table` - returns the geoshapes and geo_points as libs/geo objects in JDBC or as WKT strings in console.
- `SELECT ST_AsWKT(geom) FROM table;` and `SELECT ST_AsText(geom) FROM table;`- returns the geoshapes ang geopoints in their WKT representation;

- The following functions will be supported for geopoints in queries, sorting and aggregations: `ST_GeomFromText`, `ST_X`, `ST_Y`, `ST_Z`, `ST_GeometryType`, and `ST_Distance`. In most cases when used in queries, sorting and aggregations, these function are translated into script. These functions can be used in the SELECT clause for both geopoints and geoshapes.
- `SELECT * FROM table WHERE ST_Distance(ST_GeomFromText(POINT(1 2), point) < 10;` - returns all records for which `point` is located within 10m from the `POINT(1 2)`. In this case the WHERE clause is translated into a range query.

Geoshapes cannot be used in queries, sorting and aggregations as part of this initial effort. In order to fully take advantage of geoshapes we would need to have access to geoshape doc values, which is coming in elastic#37206. `ST_Z` cannot be used on geopoints in queries, sorting and aggregations since we don't store altitude in geo_point doc values.

Relates to elastic#29872
imotov added a commit that referenced this issue May 14, 2019
Adds an initial limited implementations of geo features to SQL. This implementation is based on the [OpenGIS® Implementation Standard for Geographic information - Simple feature access](http://www.opengeospatial.org/standards/sfs), which is the current standard for GIS system implementation. This effort is concentrate on SQL option AKA ISO 19125-2. 

Queries that are supported as a result of this initial implementation

Metadata commands

- `DESCRIBE table`  - returns the correct column types `GEOMETRY` for geo shapes and geo points.
- `SHOW FUNCTIONS` - returns a list that includes supported `ST_` functions
- `SYS TYPES` and `SYS COLUMNS` display correct types `GEO_SHAPE` and `GEO_POINT` for geo shapes and geo points accordingly. 

Returning geoshapes and geopoints from elasticsearch

- `SELECT geom FROM table` - returns the geoshapes and geo_points as libs/geo objects in JDBC or as WKT strings in console.
- `SELECT ST_AsWKT(geom) FROM table;` and `SELECT ST_AsText(geom) FROM table;`- returns the geoshapes ang geopoints in their WKT representation;

Using geopoints to elasticsearch

- The following functions will be supported for geopoints in queries, sorting and aggregations: `ST_GeomFromText`, `ST_X`, `ST_Y`, `ST_Z`, `ST_GeometryType`, and `ST_Distance`. In most cases when used in queries, sorting and aggregations, these function are translated into script. These functions can be used in the SELECT clause for both geopoints and geoshapes. 
- `SELECT * FROM table WHERE ST_Distance(ST_GeomFromText(POINT(1 2), point) < 10;` - returns all records for which `point` is located within 10m from the `POINT(1 2)`. In this case the WHERE clause is translated into a range query.

Limitations:

Geoshapes cannot be used in queries, sorting and aggregations as part of this initial effort. In order to fully take advantage of geoshapes we would need to have access to geoshape doc values, which is coming in #37206. `ST_Z` cannot be used on geopoints in queries, sorting and aggregations since we don't store altitude in geo_point doc values.

Relates to #29872
Backport of #42031
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this issue May 27, 2019
Adds an initial limited implementations of geo features to SQL. This implementation is based on the [OpenGIS® Implementation Standard for Geographic information - Simple feature access](http://www.opengeospatial.org/standards/sfs), which is the current standard for GIS system implementation. This effort is concentrate on SQL option AKA ISO 19125-2. 

## Queries that are supported as a result of this initial implementation

###  Metadata commands

- `DESCRIBE table`  - returns the correct column types `GEOMETRY` for geo shapes and geo points.
- `SHOW FUNCTIONS` - returns a list that includes supported `ST_` functions
- `SYS TYPES` and `SYS COLUMNS` display correct types `GEO_SHAPE` and `GEO_POINT` for geo shapes and geo points accordingly. 

### Returning geoshapes and geopoints from elasticsearch

- `SELECT geom FROM table` - returns the geoshapes and geo_points as libs/geo objects in JDBC or as WKT strings in console.
- `SELECT ST_AsWKT(geom) FROM table;` and `SELECT ST_AsText(geom) FROM table;`- returns the geoshapes ang geopoints in their WKT representation;

### Using geopoints to elasticsearch

- The following functions will be supported for geopoints in queries, sorting and aggregations: `ST_GeomFromText`, `ST_X`, `ST_Y`, `ST_Z`, `ST_GeometryType`, and `ST_Distance`. In most cases when used in queries, sorting and aggregations, these function are translated into script. These functions can be used in the SELECT clause for both geopoints and geoshapes. 
- `SELECT * FROM table WHERE ST_Distance(ST_GeomFromText(POINT(1 2), point) < 10;` - returns all records for which `point` is located within 10m from the `POINT(1 2)`. In this case the WHERE clause is translated into a range query.

## Limitations:

Geoshapes cannot be used in queries, sorting and aggregations as part of this initial effort. In order to fully take advantage of geoshapes we would need to have access to geoshape doc values, which is coming in elastic#37206. `ST_Z` cannot be used on geopoints in queries, sorting and aggregations since we don't store altitude in geo_point doc values.

Relates to elastic#29872
@jpountz
Copy link
Contributor

jpountz commented Jul 9, 2019

I had a look at the in-progress branch, this looks great. I also have some thoughts, which I'm logging here for the record:

  • EdgeTree only indexes the minY values, should it also index the maxY, minX and maxX values by alternatively splitting on each of these fields like we do for multi-points?
  • Serialization is a bit wasteful in terms of space: we don't need to serialize separately minY, maxY, y1 and y2: we could just recompute minY and maxY at deserialization time? Furthermore we could move from a fixed-length encoding to variable-length. My understanding is that we only leverage fixed length encoding in order to know how many bytes serializing the left child requires before actually serializing it. We could do it by serializing into a BytesStreamOutput first. It makes indexing a bit more memory-intensive, but this sounds worth the space savings to me.
  • GeometryTreeReader#intersects iterates over sub geometries linearly and returns true as soon as one matches. Should we build a tree there as well to avoid this linear scan?
  • This is the kind of functionality that it would be nice to have microbenchmarks for (the benchmarks sub folder in the root directory).

@talevy
Copy link
Contributor

talevy commented Jul 9, 2019

thank you for taking a look, Adrien! I generally agree with all of your bullet points.
responses inline...

I had a look at the in-progress branch, this looks great. I also have some thoughts, which I'm logging here for the record:

  • EdgeTree only indexes the minY values, should it also index the maxY, minX and maxX values by alternatively splitting on each of these fields like we do for multi-points?

I will discuss this with Ignacio and Nick, It follows the existing edgetree implementation in Lucene, and it does not do this, most likely for more efficient point-in-polygon calculation using the infinite ray algorithm.

  • Serialization is a bit wasteful in terms of space: we don't need to serialize separately minY, maxY, y1 and y2: we could just recompute minY and maxY at deserialization time? Furthermore we could move from a fixed-length encoding to variable-length. My understanding is that we only leverage fixed length encoding in order to know how many bytes serializing the left child requires before actually serializing it. We could do it by serializing into a BytesStreamOutput first. It makes indexing a bit more memory-intensive, but this sounds worth the space savings to me.

I will explore this! I wanted to first get something to benchmark, and then we can iterate to see the tradeoffs in more quantitative terms. I do agree, though, it would be great to reduce the storage size.

  • GeometryTreeReader#intersects iterates over sub geometries linearly and returns true as soon as one matches. Should we build a tree there as well to avoid this linear scan?

I forgot to comment this as a TODO, but this is very much a TODO item to make this into a tree. it was left as a linear scan for now to focus on the sub-trees first.

  • This is the kind of functionality that it would be nice to have microbenchmarks for (the benchmarks sub folder in the root directory).

I wasn't too familiar with this module, I will add a TODO in the meta-issue to add micro-benchmarks here! thank you for the suggestion.

imotov added a commit to imotov/elasticsearch that referenced this issue Nov 14, 2019
The tests is disabled at the moment since it fails regularly.

Relates elastic#37206
talevy added a commit that referenced this issue Nov 22, 2019
This PR modifies the EdgeTree in the [geoshape-doc-values initiative](#37206) to encode the 
points in a variable fashion. It also adds caching to reduce the number of new Edge 
objects created and reduce the number of deserializations needed when an aggregation
queries the shape multiple times like it does in geogrid aggregations

The modifications include:

- delta encoding of edge's coordinates using delta-encoding based on maxX, maxY of the Extent
- remove Edge object construction and in-line all the deserialization of the edge contents within each method

after these changes, two aspects of the GeometryTree feel like TODOs

- reduce serialized size of Extent and simplify the `checkExtent` logic
- compress Point2D tree
imotov added a commit to imotov/elasticsearch that referenced this issue Nov 26, 2019
Add support for multi shapes and geometry collections to
GeometryTreeReader and GeometryTreeWriter.

Relates elastic#37206
imotov added a commit that referenced this issue Nov 27, 2019
…49608)

Adds support geometry collections to GeometryTreeReader
and GeometryTreeWriter.

Relates #37206
imotov added a commit that referenced this issue Nov 27, 2019
…49608)

Adds support geometry collections to GeometryTreeReader
and GeometryTreeWriter.

Relates #37206
talevy added a commit to talevy/elasticsearch that referenced this issue Dec 11, 2019
This commit serializes the ShapeType of the indexed
geometry. The ShapeType can be useful for other future
features. For one thing: elastic#49887 depends on the ability
to determine what the highest dimensional shape is
for centroid calculations.

GeometryCollection is reduced to the sub-shape of the
higest dimension

relates elastic#37206.
@polyfractal polyfractal removed the 7x label Dec 12, 2019
talevy added a commit that referenced this issue Dec 18, 2019
This commit serializes the ShapeType of the indexed
geometry. The ShapeType can be useful for other future
features. For one thing: #49887 depends on the ability
to determine what the highest dimensional shape is
for centroid calculations.

GeometryCollection is reduced to the sub-shape of the
highest dimension

relates #37206.
talevy added a commit that referenced this issue Dec 19, 2019
This commit serializes the ShapeType of the indexed
geometry. The ShapeType can be useful for other future
features. For one thing: #49887 depends on the ability
to determine what the highest dimensional shape is
for centroid calculations.

GeometryCollection is reduced to the sub-shape of the
highest dimension

relates #37206.
talevy added a commit that referenced this issue Feb 24, 2020
This PR adds support for the `doc_values` field mapping parameter.

`true` and `false` supported by the GeoShapeFieldMapper,
only `false` is supported by the LegacyGeoShapeFieldMapper.

relates #37206
talevy added a commit that referenced this issue Feb 24, 2020
This PR adds support for the `doc_values` field mapping parameter.

`true` and `false` supported by the GeoShapeFieldMapper,
only `false` is supported by the LegacyGeoShapeFieldMapper.

relates #37206
@jpountz jpountz mentioned this issue Feb 28, 2020
22 tasks
@talevy
Copy link
Contributor

talevy commented Apr 29, 2020

This issue has gone through a lot of evolution, so this comment is meant to give an update on where things landed.

The work for adding geo_shape support for geo_centroid, geo_bounds, geotile_grid, geohash_grid aggregations was accomplished in x-pack by way of the following PRs:

This is achieved by implementing by migrating geo_shape mapper registration
to a module (#53562) and then defining the geo_shape with doc values in the spatial plugin (#55037)

@rjernst rjernst added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 4, 2020
@talevy
Copy link
Contributor

talevy commented May 5, 2020

update from the last comment above:

geo_bounds, geo_centroid, geotile_grid, geohash_grid support for geo_shape
is now merged and supported in 7.8 and master (8.0).

Closing this issue to reflect that. at the time of closing Documentation and the topic of geo_distance agg support are still TODOs. documentation is happening shortly, while the geo_distance agg support will require more discussion and future development

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes >enhancement Meta release highlight Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v7.8.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants