Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Envelope GeoShape search across dateline doesn't find correct documents. #22564

Closed
dbstovall opened this issue Jan 11, 2017 · 9 comments
Closed
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@dbstovall
Copy link

dbstovall commented Jan 11, 2017

Elasticsearch version: 2.2 - 5.1

JVM version:

openjdk version "1.8.0_72-internal"
OpenJDK Runtime Environment (build 1.8.0_72-internal-b15)
OpenJDK 64-Bit Server VM (build 25.72-b15, mixed mode)

OS version:
Alpine Linux v3.4 in Docker 1.12.5 Stable on Mac OS 10.12.2

Description of the problem including expected versus actual behavior:
Envelopes create a bounding box that won't cross longitude 180 ( or -180, roughly the international dateline).

Steps to reproduce:

  1. Store geo shape polygon at coordinates [[179,1], [179, -1], [-179, -1], [-179, 1], [179, 1]] in an index.
  2. Run a geo_shape query against the index with an envelope of [[170, 10], [-170, -10]]
  3. The polygon that was stored should return but is not.

ShapBuilder, in parseEnvelope, modifies the upper left and lower right coordinates. It sets the upper left to the minimum longitude and maximum latitude of the coordinates provided. Lower right gets the maximum longitude and minimum latitude of the coordinates provided. This means that an envelope of [[170, 10], [10, -10]] will not find a polygon with a coordinate or (179,1) as part of a geo_shape query using envelope because the bounds of the envelope will be modified to [[10, 10], [170, -10]]. It's fine to modify the latitude coordinates setting the maximum to the upper left, and the minimum to the upper right, but the longitude coordinates should not be changed.

Here is a simple Sense script that demonstrates the problem. It creates two polygons on the equator, one on the prime meridian and one on the international dateline. It then searches the polygon on the international dateline using an identical polygon and envelope search. The polygon search returns the correct document, the envelope search returns the wrong document.

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "location": {
          "type": "geo_shape"
        }
      }
    }
  }
}

PUT my_index/my_type/1
{
  "text": "Geo-shape as an polygon on dateline and equator",
  "location": { 
    "type":"polygon",
    "coordinates":[[
      [179,1],
      [179,-1],
      [-179,-1],
      [-179,1],
      [179,1]
    ]]
  }
}

PUT my_index/my_type/2
{
  "text": "Geo-shape as an polygon on prime meridian and equator",
  "location": { 
    "type":"polygon",
    "coordinates":[[
      [1,1],
      [1,-1],
      [-1,-1],
      [-1,1],
      [1,1]
    ]]
  }
}

GET my_index/_search
{
    "query":{
        "bool": {
            "must": {
                "match_all": {}
            },
            "filter": {
                "geo_shape": {
                    "location": {
                        "shape": {
                            "type": "polygon",
                            "coordinates" : [[
                              [170, 10],
                              [170, -10],
                              [-170, -10],
                              [-170, 10],
                              [170, 10]
                            ]]
                        },
                        "relation": "within"
                    }
                }
            }
        }
    }
}

GET my_index/_search
{
    "query":{
        "bool": {
            "must": {
                "match_all": {}
            },
            "filter": {
                "geo_shape": {
                    "location": {
                        "shape": {
                            "type": "envelope",
                            "coordinates" : [
                              [170, 10],
                              [-170, -10]
                            ]
                        },
                        "relation": "within"
                    }
                }
            }
        }
    }
}
@clintongormley clintongormley added :Analytics/Geo Indexing, search aggregations of geo points and shapes >bug labels Jan 12, 2017
@clintongormley
Copy link

@nknize please could you take a look

@imotov
Copy link
Contributor

imotov commented Dec 11, 2018

Still reproduces in master, but it might make sense to revisit this after #32039 since it changes most of this code.

@imotov imotov added the stalled label Dec 11, 2018
@imotov
Copy link
Contributor

imotov commented Dec 20, 2018

That still seems to be an issue after #35320.

@uschindler
Copy link
Contributor

uschindler commented Sep 12, 2019

Hi,
the problem here exists since this change: #9091

The problem is that an "envelope" has a different meaning than a "bounding box". The above fix is there to exchange the coordinates if the envelope is not in "cartesian order". If Elasticsearch would support "bounding boxes" (which have a special meaning in the GIS world), then the order of coordinates would be defined (topLeft first, then bottomRight or better northWest, southEast). In that case a box crossing the date line would have a x coordinate (longitude) on the eastern bound of the box that is smaller than the west bound longitude (e.g., west bound is 179, east bound is -179, so its a box crossing date line and spans 2 degrees). In that case its clear that it crosses date border.

With envelopes, it's just a box and Elasticsearch just corrects it to be a box in cartesian form, which is broken for spherical coordinates.

The work around we are intending at PANGAEA is to use a geometryCollection while indexing and while searching that has 2 separate envelopes (on both sides of date line). This would also solve the user's original request. Of course using polygons is also fine, but that's even more complicated to handle correctly if your own APIs just handle with bboxes.

IMHO, Elasticsearch should add another GeoShape type as "bbox" that has the common bounding box semantics used in WGS84.

@nknize
Copy link
Contributor

nknize commented Sep 16, 2019

IMHO, Elasticsearch should add another GeoShape type as "bbox" that has the common bounding box semantics used in WGS84.

👍 I agree.

BBox should follow subclause 10.2.5 and D.13 of the OGC Web Service Common Implementation Specification; specifically D.13:

The bounding box contents defined in Subclause 10.2 will not always specify the
MINIMUM rectangular BOUNDING region, if the referenced CRS uses an Ellipsoidal,
Spherical, Polar, or Cylindrical coordinate system. 
.
.
.

b.) ... (The LowerCorner would no longer always use the minimum value, and
the UpperCorner would no longer always use the maximum value. The value at the
LowerCorner can be greater than at the UpperCorner when this bounding box crosses
the value discontinuity.)

@uschindler
Copy link
Contributor

uschindler commented Sep 17, 2019

Hi @nknize,
I agree. In GML or ISO19115 metadata (that uses GML) the coordinates in the bbox data type are already named westBoundLongitude, southBoundLatitude, northBoundLatitude and eastBoundLongitude. With that definition there is no discussion needed, west longitude can definitely be numerically larger that east longitude when it crosses dateline.
Problem here is GeoJSON which uses X/Y and uses terms like min/max.
But all tools out there (like Google Maps) where you definitely need to implement cross date line bboxes use the GML definition.

@rcoup
Copy link

rcoup commented Sep 17, 2019

Problem here is GeoJSON which uses X/Y and uses terms like min/max.

That's not correct as of the publication of RFC7946:

  • coordinate order is [easting/longitude, northing/latitude, [height]]
  • bounding boxes are [west, south, [min-height,] east, north, [max-height]]
  • it specifically discusses the antimeridian with respect to bounding boxes
  • there's a recommendation (but not a requirement), to cut geometries at the antimeridian into component parts.

@uschindler
Copy link
Contributor

it specifically discusses the antimeridian with respect to bounding boxes

Yes, but the bbox is just metadata for the JSON file. It's not defined as a geometry. But yes, you are right.

In general in GeoJSON you should split geometries at date line, but Elasticsearch does not require this for all other datatypes except envelope.

@rjernst rjernst added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 4, 2020
@iverase
Copy link
Contributor

iverase commented Sep 23, 2022

I just run the following commands in Elasticsearch 8.4.2:

DELETE my_index

PUT my_index
{
  "mappings": {
      "properties": {
        "location": {
          "type": "geo_shape"
        }
      }
  }
}

PUT my_index/_doc/1
{
  "text": "Geo-shape as an polygon on dateline and equator",
  "location": { 
    "type":"polygon",
    "coordinates":[[
      [179,1],
      [179,-1],
      [-179,-1],
      [-179,1],
      [179,1]
    ]]
  }
}

PUT my_index/_doc/2
{
  "text": "Geo-shape as an polygon on prime meridian and equator",
  "location": { 
    "type":"polygon",
    "coordinates":[[
      [1,1],
      [1,-1],
      [-1,-1],
      [-1,1],
      [1,1]
    ]]
  }
}

GET my_index/_search
{
    "query":{
        "bool": {
            "must": {
                "match_all": {}
            },
            "filter": {
                "geo_shape": {
                    "location": {
                        "shape": {
                            "type": "polygon",
                            "coordinates" : [[
                              [170, 10],
                              [170, -10],
                              [-170, -10],
                              [-170, 10],
                              [170, 10]
                            ]]
                        },
                        "relation": "within"
                    }
                }
            }
        }
    }
}

And it returns:

{
  "took": 12,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "my_index",
        "_id": "1",
        "_score": 1,
        "_source": {
          "text": "Geo-shape as an polygon on dateline and equator",
          "location": {
            "type": "polygon",
            "coordinates": [
              [
                [
                  179,
                  1
                ],
                [
                  179,
                  -1
                ],
                [
                  -179,
                  -1
                ],
                [
                  -179,
                  1
                ],
                [
                  179,
                  1
                ]
              ]
            ]
          }
        }
      }
    ]
  }
}

Therefore I am closing this issue as It is fixed.

@iverase iverase closed this as completed Sep 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests

8 participants