Added seriesMedian() and seriesPercentile() statistics to the series obj... #66
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a pretty straight-forward addition of seriesMedian() and seriesPercentile() to the list of statistics that can be computed for a series object. These are useful for all sorts of things, but presently I am hoping to use them to stretch the contrast of the data before colormapping it, retaining only the data that falls between the 1% and 99% percentile, and clipping off the few outliers outside of that!
The median (but not the percentile) is also computed when you call the seriesStats() function, since computing the median is not a terribly expensive operation compared the communication overhead involved in getting the results back with Spark.
The percentile is only available as a seriesPercentile() method, and not as part of the seriesStats(), since it requires an argument (the percentile) to work. If desired, we could add some pre-computed percentiles (1%, 99%) to the seriesStats() method, but I think its fine to let the user call seriesPercentile() for now.