[Medium] Fast solvers for situations with many columns. #133

tbenthompson · 2020-05-20T17:53:36Z

Currently, the solvers all build a Fisher matrix/Hessian. If the data matrix is n x m, this Fisher matrix will have shape m x m. If m is less than a few thousand, this is fine. However, if m gets very large, possibly due to lots of categorical variables, the current solvers will be inadequate because they are constructing a very large dense matrix.

Currently, we can avoid that matrix construction using the diag_fisher=True option. However, this is not particularly well supported and performance might not be very good. We should consider adding a benchmark with more than 10,000 columns to better track performance in these cases.

As part of this effort, it would be nice to separate the diag_fisher behavior and simply use a different solver function for high dimensional problems.

The text was updated successfully, but these errors were encountered:

ElizabethSantorellaQC · 2020-05-20T18:09:44Z

Out of the solvers we currently have available, I think the only ones that don't construct the full Hessian would be L-BFGS and CD with diag_fisher (but let me know if I'm wrong). So I think diag_fisher is valuable as it's the only way we can reliably use both an L1 penalty and wide data.

I don't fully understand the value of the off-diagonal elements of the Hessian for CD when it's only updating one variable at a time. Why does performance degrade if we only use the diagonal?

tbenthompson · 2020-05-20T18:16:03Z

I don't fully understand the value of the off-diagonal elements of the Hessian for CD when it's only updating one variable at a time. Why does performance degrade if we only use the diagonal?

The current gradient is updated at each step of CD. So, if you only have the diagonal, the gradient update is incorrect for every variable besides the current one.

ElizabethSantorellaQC · 2020-05-26T14:45:17Z

Notes from meeting:

ADMM currently does not seem to help here since it requires solving a linear system repeatedly, which makes it slower per step. Perhaps we could speed it up by using an iterative sparse solver.
The diag_fisher strategy might be helpful -- perhaps the implementation just had a bug.
h2o has at least 3 solvers available -- we should benchmark each of these.

tbenthompson · 2020-05-26T15:03:47Z

Some potential steps:

Add a benchmark problem with 5,000 columns -- solvable but slow with current methods.
Reimplement the diag_fisher stuff.
Add a benchmark problem with 50,000 columns -- not solvable with current methods.

tbenthompson · 2020-06-09T21:18:03Z

An interesting idea here would be to implement an L-BFGS + CD solver. At each iteration, the new gradient information would be used to approximate the Hessian using a low rank approximation. Then, that low rank approximation could be used inside the CD solver instead of either the full data matrix or the true hessian.

tbenthompson · 2020-06-18T15:14:23Z

More good sources on "hessian-free" second order optimization methods.
https://andrew.gibiansky.com/blog/machine-learning/gauss-newton-matrix/
http://www.cs.toronto.edu/~jmartens/docs/Deep_HessianFree.pdf

benjaminfspector · 2020-06-18T15:48:47Z

@tbenthompson and I were just talking about another alternative which could work. Essentially, the idea would be to create a third, outermost loop in which one samples a tractable column minibatch (say, 500 columns), and then runs the solver for several iterations. In particular, this method might work well with @tbenthompson's new approximate Hessian code, since one could probably be quite aggressive with enforcing sparsity in the delta-D vector, making subsequent iterations of the middle loop very inexpensive due to the now-cheap Hessian update. It also has the advantage of being a relatively easy modification to the existing solver, so it wouldn't be hard to try and benchmark future methods against.

ElizabethSantorellaQC added the performance speed, memory, or accuracy label May 26, 2020

tbenthompson assigned benjaminfspector Jul 6, 2020

tbenthompson unassigned benjaminfspector Aug 27, 2021

tbenthompson added the on hold not now, maybe never label Sep 23, 2021

mathurinm mentioned this issue Nov 22, 2021

FIX set GLUM sampling to tolerance as only 1 iter is run for Lasso benchopt/benchmark_lasso#38

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Medium] Fast solvers for situations with many columns. #133

[Medium] Fast solvers for situations with many columns. #133

tbenthompson commented May 20, 2020

ElizabethSantorellaQC commented May 20, 2020 •

edited

Loading

tbenthompson commented May 20, 2020

ElizabethSantorellaQC commented May 26, 2020

tbenthompson commented May 26, 2020

tbenthompson commented Jun 9, 2020

tbenthompson commented Jun 18, 2020

benjaminfspector commented Jun 18, 2020

[Medium] Fast solvers for situations with many columns. #133

[Medium] Fast solvers for situations with many columns. #133

Comments

tbenthompson commented May 20, 2020

ElizabethSantorellaQC commented May 20, 2020 • edited Loading

tbenthompson commented May 20, 2020

ElizabethSantorellaQC commented May 26, 2020

tbenthompson commented May 26, 2020

tbenthompson commented Jun 9, 2020

tbenthompson commented Jun 18, 2020

benjaminfspector commented Jun 18, 2020

ElizabethSantorellaQC commented May 20, 2020 •

edited

Loading