Added seriesMedian() and seriesPercentile() statistics to the series obj... #66

broxtronix · 2014-12-04T17:43:15Z

This is a pretty straight-forward addition of seriesMedian() and seriesPercentile() to the list of statistics that can be computed for a series object. These are useful for all sorts of things, but presently I am hoping to use them to stretch the contrast of the data before colormapping it, retaining only the data that falls between the 1% and 99% percentile, and clipping off the few outliers outside of that!

The median (but not the percentile) is also computed when you call the seriesStats() function, since computing the median is not a terribly expensive operation compared the communication overhead involved in getting the results back with Spark.

The percentile is only available as a seriesPercentile() method, and not as part of the seriesStats(), since it requires an argument (the percentile) to work. If desired, we could add some pre-computed percentiles (1%, 99%) to the seriesStats() method, but I think its fine to let the user call seriesPercentile() for now.

…object.

industrial-sloth · 2014-12-04T18:05:12Z

This looks pretty solid to me. I think I agree about leaving seriesPercentile() out of seriesStats() - it's not obvious what the default arguments there should be (though 1% and 99% does seem as good a choice as anything).

@broxtronix, any chance you could add calls to seriesMedian() and seriesPercentile() alongside the existing assertions in test_series.py TestSeriesMethods.test_series_stats()? And for seriesPercentile, could you also add a call that runs the multiple-argument version (which I think would be something like data.seriesPercentile((20,80)), if I'm interpreting the docs correctly). Thanks!

industrial-sloth · 2014-12-10T18:56:49Z

Never mind about the unit tests @broxtronix - I'm in the process of adding / updating a bunch of unit tests anyway, so I'll get that put in later on today. cheers...

Added seriesMedian() and seriesPercentile() statistics to the series obj...

broxtronix · 2014-12-10T19:24:06Z

Hey @industrial-sloth! Very sorry for the delay on checking back in on this pull request. During the past couple of days I was pulled away from coding to prepares some other work. Huge thanks for adding these unit tests, and for pulling in this pull request. Much appreciated!!

industrial-sloth · 2014-12-10T21:03:43Z

Not a problem at all - am going through unit tests this afternoon anyway, so it's very easy for me to get this knocked out as well. Thanks for the PR!

Added seriesMedian() and seriesPercentile() statistics to the series …

adcfbab

…object.

industrial-sloth added a commit that referenced this pull request Dec 10, 2014

Merge pull request #66 from broxtronix/additional_stats

69cb043

Added seriesMedian() and seriesPercentile() statistics to the series obj...

industrial-sloth merged commit 69cb043 into thunder-project:master Dec 10, 2014

broxtronix deleted the additional_stats branch December 10, 2014 19:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added seriesMedian() and seriesPercentile() statistics to the series obj... #66

Added seriesMedian() and seriesPercentile() statistics to the series obj... #66

broxtronix commented Dec 4, 2014

industrial-sloth commented Dec 4, 2014

industrial-sloth commented Dec 10, 2014

broxtronix commented Dec 10, 2014

industrial-sloth commented Dec 10, 2014

Added seriesMedian() and seriesPercentile() statistics to the series obj... #66

Added seriesMedian() and seriesPercentile() statistics to the series obj... #66

Conversation

broxtronix commented Dec 4, 2014

industrial-sloth commented Dec 4, 2014

industrial-sloth commented Dec 10, 2014

broxtronix commented Dec 10, 2014

industrial-sloth commented Dec 10, 2014