Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add geo_line aggregation #41612

Merged
merged 34 commits into from
Nov 23, 2020
Merged

Add geo_line aggregation #41612

merged 34 commits into from
Nov 23, 2020

Conversation

talevy
Copy link
Contributor

@talevy talevy commented Apr 26, 2019

A metric aggregation that aggregates a set of points as
a GeoJSON LineString ordered by some sort parameter.

specifics

A geo_line aggregation request would specify a geo_point field, as well
as a sort field. geo_point represents the values used in the LineString,
while the sort values will be used as the total ordering of the points.

the sort field would support any numeric field, including date.

sample usage

{
	"query": {
		"bool": {
			"must": [
				{ "term": { "person": "004" } },
				{ "term": { "trajectory": "20090131002206.plt" } }
			]
		}
	},
	"aggs": {
		"make_line": {
			"geo_line": {
				"point": {"field": "location"},
				"sort": { "field": "timestamp" },
                                "include_sort": true,
                                "sort_order": "desc",
                                "size": 15
			}
		}
	}
}

sample response

{
    "took": 21,
    "timed_out": false,
    "_shards": {...},
    "hits": {...},
    "aggregations": {
        "make_line": {
            "type": "LineString",
            "coordinates": [
                [
                    121.52926194481552,
                    38.92878997139633
                ],
                [
                    121.52922699227929,
                    38.92876998055726
                ],
             ]
        }
    }
}

visual response

Screen Shot 2019-04-26 at 9 40 07 AM

limitations

Due to the cardinality of points, an initial max of 10k points
will be used. This should support many use-cases.

One solution to overcome this limitation is to keep a PriorityQueue of
points, and simplifying the line once it hits this max. If simplifying
makes sense, it may be a nice option, in general. The ability to use a parameter
to specify how aggressive one wants to simplify. This parameter could be
the number of points. Example algorithm one could use with a PriorityQueue:
https://bost.ocks.org/mike/simplify/. This would still require O(m) space, where m
is the number of points returned. And would also require heapifying triangles
sorted by their areas, which would be O(log(m)) operations. Since sorting is done,
anyways, simplifying would still be a O(n log(m)) operation, where n is the total number
of points to filter........... something to explore

closes #41649

@talevy talevy added >feature :Analytics/Geo Indexing, search aggregations of geo points and shapes labels Apr 26, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo

@talevy talevy added the WIP label Apr 26, 2019
Copy link
Contributor

@polyfractal polyfractal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments about potential optimizations and such :)

Also a thought... I wonder if we could/should implement this as a pipeline agg? It shares many similarities to metrics ordered by time. A user could define a date_histogram on the time field, some kind of aggregating "metric" to collapse multiple geo_points in one bucket to a single point (average them?) and then the geo_line pipeline agg strings the multiple buckets into a single linestring.

It gives you all the sorting stuff for free, and sorta gives you line simplification out of the box, in that larger date_histo interval automatically gives you a less-granular line string. Perhaps not "smart" in that it's simplifying by time and not line complexity, but might be an ok start?

Dunno, just a thought I had while looking over how it works.

@thomasneirynck
Copy link
Contributor

thomasneirynck commented Sep 14, 2020

Thanks, this is super useful!

The return-type is now a Geojson-Feature. Is this an issue from an aesthetic standpoint for Elasticsearch-API? (I don't know of any parallels where ES is hacking another data-format as an agg-response).

So just for some context: from a user-perspective (and specifically Maps), the idea to return valid GeoJson is mainly useful because it allows Maps to reuse the coordinates-array by-reference, without any post-processing (except for parsing of the JSON-response, which is a native browser function).

If it feels "odd" to wrap the sort_values inside the Feature#properties, I think it would be fine to just return the geometry-portion of the ES-response as either:

  • a valid GeoJson linestring
  • a valid 2D GeoJson coordinate-array (array of array coordinate-pairs).

Basically, as long as the coordinates-array is there in valid GeoJson, it's a win for Maps.

The additional metadata about the individual coordinates (ie. sort_values) can always be in a separate object.

Just a thought, because if most users would exclude the sort_values from their agg, it might feel odd to just have dangling empty properties there.


Would it be useful to make it explicit if the line-string is complete or not in the response? Right now, we can compare doc-count of the bucket with point-count in the line, but this will no longer work when simplification would be introduced.

@talevy
Copy link
Contributor Author

talevy commented Sep 14, 2020

heya @thomasneirynck. thanks for the comments/concerns/suggestions!

If it feels "odd" to wrap the sort_values inside the Feature#properties, I think it would be fine to just return the geometry-portion of the ES-response as either:

it does not feel odd to me, as this is a property of the geometry, just happens to be a multi-dimensional property :)

there is an include_sort flag that will hide or show the sort values, since sometimes the geometry itself is all that matters.

Just a thought, because if most users would exclude the sort_values from their agg, it might feel odd to just have dangling empty properties there.

the properties will not be dangling. There is one other property that exists. the complete property that is true if the line returned is a complete representation of the data or if it had to drop some points.

Would it be useful to make it explicit if the line-string is complete or not in the response?

yes, the complete exists for this purpose. I intend for there to be a simplify flag for the request to tell the aggregator whether to simplify the line returned or not. A line can be both simplified and complete, in the sense that all of the data was taken as input to the simplification algorithm

@talevy talevy marked this pull request as ready for review October 26, 2020 17:16
@talevy talevy removed the WIP label Oct 26, 2020
@talevy talevy requested a review from iverase October 30, 2020 21:26
@talevy talevy requested a review from iverase November 17, 2020 16:21
@talevy
Copy link
Contributor Author

talevy commented Nov 18, 2020

run elasticsearch-ci/2

Copy link
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm Just left a comment but not sure if it is possible

@talevy talevy merged commit b514d9b into elastic:master Nov 23, 2020
@talevy talevy deleted the geo_line branch November 23, 2020 18:26
talevy added a commit that referenced this pull request Nov 24, 2020
A metric aggregation that aggregates a set of points as
a GeoJSON LineString ordered by some sort parameter.

A `geo_line` aggregation request would specify a `geo_point` field, as well
as a `sort` field. `geo_point` represents the values used in the LineString,
while the `sort` values will be used as the total ordering of the points.

the `sort` field would support any numeric field, including date.

```
{
	"query": {
		"bool": {
			"must": [
				{ "term": { "person": "004" } },
				{ "term": { "trajectory": "20090131002206.plt" } }
			]
		}
	},
	"aggs": {
		"make_line": {
			"geo_line": {
				"point": {"field": "location"},
				"sort": { "field": "timestamp" },
                                "include_sort": true,
                                "sort_order": "desc",
                                "size": 15
			}
		}
	}
}
```

```
{
    "took": 21,
    "timed_out": false,
    "_shards": {...},
    "hits": {...},
    "aggregations": {
        "make_line": {
            "type": "LineString",
            "coordinates": [
                [
                    121.52926194481552,
                    38.92878997139633
                ],
                [
                    121.52922699227929,
                    38.92876998055726
                ],
             ]
        }
    }
}
```

Due to the cardinality of points, an initial max of 10k points
will be used. This should support many use-cases.

One solution to overcome this limitation is to keep a PriorityQueue of
points, and simplifying the line once it hits this max. If simplifying
makes sense, it may be a nice option, in general. The ability to use a parameter
to specify how aggressive one wants to simplify. This parameter could be
the number of points. Example algorithm one could use with a PriorityQueue:
https://bost.ocks.org/mike/simplify/. This would still require O(m) space, where m
is the number of points returned. And would also require heapifying triangles
sorted by their areas, which would be O(log(m)) operations. Since sorting is done,
anyways, simplifying would still be a O(n log(m)) operation, where n is the total number
of points to filter........... something to explore

closes #41649
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes >feature >new-aggregation Added when a new aggregation is being introduced Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v7.11.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GeoLine Aggregation
7 participants