Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving boxplot performance #543

Closed
mrmin123 opened this issue Mar 17, 2014 · 8 comments
Closed

Improving boxplot performance #543

mrmin123 opened this issue Mar 17, 2014 · 8 comments
Milestone

Comments

@mrmin123
Copy link

Hi,

I'm trying to use dc.js to generate dynamic boxplots, but was having some performance issues with larger datasets (~50k float entries). I narrowed the issue down to this line under the reduce().reduceRemove function (taken from example):

p.splice(p.indexOf(v.Speed), 1);

I was thinking that if the p returned was pre-sorted (I'd imagine that the boxplot function already sorts the array to calculate quartiles) so that a more efficient binary search tree can be used to find the index for the splice. Is this something that can be implemented?

Thanks

@gordonwoodhull
Copy link
Contributor

Hi, indeed that looks like it could be improved. Have you profiled this?

At first glance it looks like d3.bisectLeft might be appropriate here.

@mrmin123
Copy link
Author

Hi. Not sure what you mean by profiling this. If you mean benchmarking, no real benchmarks, but there were noticeable differences when I used d3.bisectLeft as you suggested. Huge improvements when combined with underscore.js's indexOf function flagged to use binary search:

speedArrayGroup = experimentDimension.group().reduce(
        function(p,v) {
          p.splice(d3.bisectLeft(p, v.Speed), 0, v.Speed);
          return p;
        },
        function(p,v) {
          p.splice(_.indexOf(p, v.Speed, true), 1);
          return p;
        },
        function() {
          return [];
        }
      );

If you feel that this can be further improved, please let me know.

Thanks!

@gordonwoodhull
Copy link
Contributor

Cool, yes _.indexOf and d3.bisectLeft should be equivalent here, I was only suggesting the d3 function because dc is already dependent on d3.

I don't know which is more efficient

@mrmin123
Copy link
Author

Ah, I didn't realize that d3.bisectLeft was a binary search.

I would imagine that minor (or maybe not so minor?) improvements can be made by leaving the reduceAdd function to p.push(v.Speed), and somehow have the p called in reduceRemove to be the sorted array of values generated when calculating the quartiles, but I don't know if that modification would have to occur in dc.js, crossfilter, or d3... or how trivial it is would be to implement, and if it would mess up functionality elsewhere.

Either way, thanks for the quick fix.

@gordonwoodhull
Copy link
Contributor

The action item here is to fix the example.

@gordonwoodhull gordonwoodhull added this to the v2.0 milestone Jun 5, 2014
@mr23
Copy link
Contributor

mr23 commented Oct 26, 2016

Changing from p.indexOf to d3.bisectLeft saved 25% (no pre-sorting). _.indexOf was only about 12%.

@gordonwoodhull
Copy link
Contributor

A second action item here is to add an option saying that the data is pre-sorted. The d3 boxplot plugin is always sorting the data, which does not make sense in case of incremental changes.

@gordonwoodhull
Copy link
Contributor

gordonwoodhull commented Apr 9, 2019

Moving the pre-sorting option to #1527. Fixing the examples now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants