Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New @turf/clusters module (getCluster, clusterEach, clusterReduce) #847

Merged
merged 11 commits into from
Jul 20, 2017

Conversation

DenisCarriere
Copy link
Member

@DenisCarriere DenisCarriere commented Jul 17, 2017

New @turf/clusters module

Ref: #845

Methods

  • getCluster
  • clusterEach
  • clusterReduce

To-Do

  • Improve input validations
  • Drop support of Array of Features.
  • Use @turf/clusters-kmeans for @example (improve documentation)
  • Leverage @turf/helpers & @turf/meta for GeoJSON compatibility

JSDocs - getCluster

/**
 * Get Cluster
 *
 * @param {FeatureCollection} geojson GeoJSON Features
 * @param {*} filter Filter used on GeoJSON properties to get Cluster
 * @returns {FeatureCollection} Single Cluster filtered by GeoJSON Properties
 * @example
 * var geojson = turf.featureCollection([
 *     turf.point([0, 0]),
 *     turf.point([2, 4]),
 *     turf.point([3, 6]),
 *     turf.point([5, 1]),
 *     turf.point([4, 2])
 * ]);
 *
 * // Create a cluster using K-Means (adds `cluster` to GeoJSON properties)
 * var clustered = turf.clustersKmeans(geojson);
 *
 * // Retrieve first cluster (0)
 * var cluster = turf.getCluster(clustered, {cluster: 0});
 */
function getCluster(geojson, filter) {

JSDocs - clusterEach

/**
 * Callback for clusterEach
 *
 * @callback clusterEachCallback
 * @param {FeatureCollection} cluster The current cluster being processed.
 * @param {*} clusterValue Value used to create cluster being processed.
 * @param {number} currentIndex The index of the current element being processed in the array.Starts at index 0
 * @returns {void}
 */

/**
 * clusterEach
 *
 * @param {FeatureCollection} geojson GeoJSON Features
 * @param {string|number} property GeoJSON property key/value used to create clusters
 * @param {Function} callback a method that takes (cluster, clusterValue, currentIndex)
 * @returns {void}
 * @example
 * var geojson = turf.featureCollection([
 *     turf.point([0, 0]),
 *     turf.point([2, 4]),
 *     turf.point([3, 6]),
 *     turf.point([5, 1]),
 *     turf.point([4, 2])
 * ]);
 *
 * // Create a cluster using K-Means (adds `cluster` to GeoJSON properties)
 * var clustered = turf.clustersKmeans(geojson);
 *
 * // Iterate over each cluster
 * clusterEach(clustered, 'cluster', function (cluster, clusterValue, currentIndex) {
 *     //= cluster
 *     //= clusterValue
 *     //= currentIndex
 * })
 */
function clusterEach(geojson, property, callback) {

JSDocs - clusterReduce

/**
 * Callback for clusterReduce
 *
 * The first time the callback function is called, the values provided as arguments depend
 * on whether the reduce method has an initialValue argument.
 *
 * If an initialValue is provided to the reduce method:
 *  - The previousValue argument is initialValue.
 *  - The currentValue argument is the value of the first element present in the array.
 *
 * If an initialValue is not provided:
 *  - The previousValue argument is the value of the first element present in the array.
 *  - The currentValue argument is the value of the second element present in the array.
 *
 * @callback clusterReduceCallback
 * @param {*} previousValue The accumulated value previously returned in the last invocation
 * of the callback, or initialValue, if supplied.
 * @param {FeatureCollection} cluster The current cluster being processed.
 * @param {*} clusterValue Value used to create cluster being processed.
 * @param {number} currentIndex The index of the current element being processed in the
 * array. Starts at index 0, if an initialValue is provided, and at index 1 otherwise.
 */

/**
 * Reduce clusters in GeoJSON Features, similar to Array.reduce()
 *
 * @name clusterReduce
 * @param {FeatureCollection} geojson GeoJSON Features
 * @param {string|number} property GeoJSON property key/value used to create clusters
 * @param {Function} callback a method that takes (previousValue, cluster, clusterValue, currentIndex)
 * @param {*} [initialValue] Value to use as the first argument to the first call of the callback.
 * @returns {*} The value that results from the reduction.
 * @example
 * var geojson = turf.featureCollection([
 *     turf.point([0, 0]),
 *     turf.point([2, 4]),
 *     turf.point([3, 6]),
 *     turf.point([5, 1]),
 *     turf.point([4, 2])
 * ]);
 *
 * // Create a cluster using K-Means (adds `cluster` to GeoJSON properties)
 * var clustered = turf.clustersKmeans(geojson);
 *
 * // Iterate over each cluster and perform a calculation
 * var initialValue = 0;
 * var total = turf.clusterReduce(clustered, 'cluster', function (previousValue, cluster, clusterValue, currentIndex) {
 *     //=previousValue
 *     //=cluster
 *     //=clusterValue
 *     //=currentIndex
 *     return previousValue++;
 * }, initialValue);
 */
function clusterReduce(geojson, property, callback, initialValue) {

Benchmark Results

testing -- createBins x 2,954,562 ops/sec ±1.29% (89 runs sampled)
testing -- propertiesContainsFilter x 10,902,619 ops/sec ±2.11% (85 runs sampled)
testing -- filterProperties x 27,860,084 ops/sec ±1.12% (88 runs sampled)
testing -- applyFilter x 18,971,594 ops/sec ±7.14% (77 runs sampled)
getCluster -- string filter x 5,873,843 ops/sec ±1.30% (87 runs sampled)
getCluster -- object filter x 883,469 ops/sec ±1.28% (93 runs sampled)
getCluster -- aray filter x 3,602,147 ops/sec ±1.30% (89 runs sampled)
clusterEach x 1,133,904 ops/sec ±0.90% (89 runs sampled)
clusterReduce x 1,057,572 ops/sec ±1.13% (89 runs sampled)

Copy link
Collaborator

@tmcw tmcw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this module would be mostly used after turf-cluster-kmeans or another geometrical clustering algorithm, so it's kind of confusing on its own, because it essentially is a filter module? Seems like something that's mostly to be solved in the documentation, like ideally this would show how it interacts with the modules that assign clusters, rather than creating geojson with cluster attributes from scratch, since I think using it as a step after kmeans would be more common than what the docs currently show.

if (!geojson) throw new Error('geojson is required');
if (filter === undefined || filter === null) throw new Error('filter is required');
if (geojson.type === 'FeatureCollection') geojson = geojson.features;
if (!Array.isArray(geojson)) throw new Error('invalid geojson');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like an appropriate place to use turf/meta to handle any sort of data. This currently wouldn't support valid GeoJSON constructs like bare feature objects, and would support invalid geojson constructs, like an array of features.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@turf/invariant right? To test for a valid FeatureCollection.

would support invalid geojson constructs, like an array of features.

Agreed.. 👍 That's my "cheeky" style, i'll drop it.

Copy link
Member Author

@DenisCarriere DenisCarriere Jul 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like an appropriate place to use turf/meta to handle any sort of data.

Oh, you mean featureEach in @turf/meta? To add support for Feature?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, noticed that the docs say it's a FeatureCollection specifically - so, yep! turf/invariant would be useful to enforce that requirement. turf/meta would be useful if the docs promised that the input was any geojson object.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally I had it as FeatureCollection|Feature[] (FeatureCollection|Array<Feature>), now that we've dropped the Array of features (since it's not a valid GeoJSON), there isn't much complex validation to do which would require using @turf/invariant. Using geojsonType is already being handled here:

if (geojson.type !== 'FeatureCollection') throw new Error('geojson must be a FeatureCollection');

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did however add featureEach from @turf/meta to handle the iteration of the FeatureCollection.

@DenisCarriere DenisCarriere changed the title Implement getCluster method New @turf/clusters module (getCluster, clusterEach, clusterReduce) Jul 17, 2017
@DenisCarriere
Copy link
Member Author

DenisCarriere commented Jul 17, 2017

It looks like this module would be mostly used after turf-cluster-kmeans or another geometrical clustering algorithm

Agreed, this module will be used after kmeans & dbscan or any other clustering algorithms, which will allow us to focus on only adding a cluster property to each feature.

Note: @turf/clusters-kmeans will be changed to a single GeoJSON output (with extra cluster property), then we will generate centroids using the clusterEach method.

because it essentially is a filter module?

Agreed! 😄 Kinda wanted to call it groupBy or filterBy, the initial topic that started all this was about clustering.

Seems like something that's mostly to be solved in the documentation

Agreed, once we have dbscans & kmeans using this module would be a lot easier to understand. Good idea, we should use kmeans directly in this module's @example.

@tmcw

@DenisCarriere
Copy link
Member Author

@tmcw 👍 Thanks for the review

Copy link
Collaborator

@stebogit stebogit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chapeau @DenisCarriere! Great implementation 👍 😃

**Parameters**

- `geojson` **[FeatureCollection](http://geojson.org/geojson-spec.html#feature-collection-objects)** GeoJSON Features
- `filter` **Any** Filter used on GeoJSON properties to get Cluster
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think about it, only in k-means the number of clusters is known before applying the module (which often is considered its drawbacks), so you can easily "query" for a specific cluster id for example; on the other side if you apply dbscan or other algorithms you don't know a priori how many clusters will be identified.
It would be useful, I guess, to have a method that returns the total number of clusters in the FeatureCollection (basically how many groups of points have the same property with different values); it could be this one when passed null or 'all' as filter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type of questions would be better answered using clusterEach or clusterReduce.

Calculating how many total clusters you could use clusterReduce and previousValue++ which would give you the total, or if you want to figure all of the values which created a bin you could also use clusterReduce.

I'll add some more examples to clusterReduce & clusterEach

]);

// Create a cluster using K-Means (adds `cluster` to GeoJSON properties)
var clustered = turf.clustersKmeans(geojson);
Copy link
Collaborator

@stebogit stebogit Jul 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although, as @tmcw said, their primary use is definitely on actual clustered points, I mean as output of a clustering module, these functions could be useful to identify any group of points with a common property, even among points where not all have said property (like identify all points with a certain marker-symbol).
I think it could be beneficial to the user to see examples that are not necessarily related to clustered (in the above mentioned meaning) points, in order to suggest other uses and applications for this module's functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, feel free to change some of these examples or add a 2nd example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Added

var geojson = turf.featureCollection([
    turf.point([0, 0], {'marker-symbol': 'circle'}),
    turf.point([2, 4], {'marker-symbol': 'star'}),
    turf.point([3, 6], {'marker-symbol': 'star'}),
    turf.point([5, 1], {'marker-symbol': 'square'}),
    turf.point([4, 2], {'marker-symbol': 'circle'})
]);

// Create a cluster using K-Means (adds `cluster` to GeoJSON properties)
var clustered = turf.clustersKmeans(geojson);

// Retrieve first cluster (0)
var cluster = turf.getCluster(clustered, {cluster: 0});
//= cluster

// Retrieve cluster based on custom properties
turf.getCluster(clustered, {'marker-symbol': 'circle'}).length;
//= 2
turf.getCluster(clustered, {'marker-symbol': 'square'}).length;
//= 1

"author": "Turf Authors",
"contributors": [
"Denis Carriere <@DenisCarriere>",
"Stefano Borghi <@stebogit>"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't really do anything in here, it's all your own work 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got preserved from the last module, I can drop you ;) lol

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Dropped you 😄

getCluster: getCluster,
clusterEach: clusterEach,
clusterReduce: clusterReduce,
createBins: createBins, // Only exposed for testing purposes
Copy link
Collaborator

@stebogit stebogit Jul 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DenisCarriere out of curiosity, how do you exclude these functions from the official release (if that's the plan) if they are exported here?

Copy link
Member Author

@DenisCarriere DenisCarriere Jul 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to comment them out, or... we leave them there and no document their usage.

At the moment all of these utils functions have tests cases for them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Come to think about it, these are ONLY going to be exposed in the @turf/clusters module, however they will not be exposed to @turf/turf (which is the main repo that most users will be using).

The only people who will actually know that these methods are exposed will be the ones that dive into the source code or require('@turf/clusters)` and see which methods the module exposes.

Since there's test cases for each of these functions already I'm going to keep them exposed, I don't see any benefit "commented" them out for the sake of it.

- Add more `@example`
- Update Typescript
- Dropped @stebogit from contributors :)
- Add notes about exposed methods
@DenisCarriere DenisCarriere merged commit ef2735d into master Jul 20, 2017
@DenisCarriere DenisCarriere deleted the turf-clusters branch July 20, 2017 14:12
@morganherlocker
Copy link
Member

Incredible work! 🎉

@DenisCarriere
Copy link
Member Author

Thanks @morganherlocker!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants