Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added seriesMedian() and seriesPercentile() statistics to the series obj... #66

Merged

Conversation

broxtronix
Copy link
Contributor

This is a pretty straight-forward addition of seriesMedian() and seriesPercentile() to the list of statistics that can be computed for a series object. These are useful for all sorts of things, but presently I am hoping to use them to stretch the contrast of the data before colormapping it, retaining only the data that falls between the 1% and 99% percentile, and clipping off the few outliers outside of that!

The median (but not the percentile) is also computed when you call the seriesStats() function, since computing the median is not a terribly expensive operation compared the communication overhead involved in getting the results back with Spark.

The percentile is only available as a seriesPercentile() method, and not as part of the seriesStats(), since it requires an argument (the percentile) to work. If desired, we could add some pre-computed percentiles (1%, 99%) to the seriesStats() method, but I think its fine to let the user call seriesPercentile() for now.

@industrial-sloth
Copy link
Contributor

This looks pretty solid to me. I think I agree about leaving seriesPercentile() out of seriesStats() - it's not obvious what the default arguments there should be (though 1% and 99% does seem as good a choice as anything).

@broxtronix, any chance you could add calls to seriesMedian() and seriesPercentile() alongside the existing assertions in test_series.py TestSeriesMethods.test_series_stats()? And for seriesPercentile, could you also add a call that runs the multiple-argument version (which I think would be something like data.seriesPercentile((20,80)), if I'm interpreting the docs correctly). Thanks!

@industrial-sloth
Copy link
Contributor

Never mind about the unit tests @broxtronix - I'm in the process of adding / updating a bunch of unit tests anyway, so I'll get that put in later on today. cheers...

industrial-sloth added a commit that referenced this pull request Dec 10, 2014
Added seriesMedian() and seriesPercentile() statistics to the series obj...
@industrial-sloth industrial-sloth merged commit 69cb043 into thunder-project:master Dec 10, 2014
@broxtronix
Copy link
Contributor Author

Hey @industrial-sloth! Very sorry for the delay on checking back in on this pull request. During the past couple of days I was pulled away from coding to prepares some other work. Huge thanks for adding these unit tests, and for pulling in this pull request. Much appreciated!!

@broxtronix broxtronix deleted the additional_stats branch December 10, 2014 19:24
@industrial-sloth
Copy link
Contributor

Not a problem at all - am going through unit tests this afternoon anyway, so it's very easy for me to get this knocked out as well. Thanks for the PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants