Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate possibility of borrowing some features/ideas from Explainable Boosted Machines #3905

Closed
JoshuaC3 opened this issue Feb 3, 2021 · 7 comments

Comments

@JoshuaC3
Copy link

JoshuaC3 commented Feb 3, 2021

Summary

Borrow ideas from InterpretMLs Explainable Boosting Machine to make LGBM more interpretable as well as more comparable to their EBM.

Description

I am sure you are somewhat aware of InterpretMLs Explainable Boosting Machine - also a Microsoft innovation!

I have been a long time user of LGBM and now an recent fan of EBM. Clearly, they are similar in ways and each have their strong points and weaknesses. There are trade-offs of choosing either. That said, I have been playing around with some settings in LGBM to make the final models behave more like the EBMs. The reasons for this are two fold:

  1. Increase LGBM interpretability
  2. Allow more direct comparisons of results/functionality.

Some tips, tricks and findings are as follows:

Setting n_estimators high, learning_rate low, and num_leaves low, begins to mimic some of the behaviours of the EBM. That is, iteratively building LOTS of VERY shallow trees, VERY slowly. I add feature constraints so that each tree is univariate. Pairwise interactions could also be added where needed. This allows the model to learn incrementally small amounts from each feature, rather like the EBM. Essentially, replicating a model of the form:

lgb = F0(X0) + F1(X1) + ... + Fn(Xn)

which is effectively a GLM/GAM.

lgb = LGBMRegressor(
    n_estimators=5000, #large
    learning_rate=0.01, #small
    num_leaves=4, #shallow
    interaction_constraints=[[i] for i in range(X.shape[1])], #[[i, j] for i in range(X.shape[1]) for j in range(X.shape[1]) if i != j]
#     feature_contri=[10 for i in range(X.shape[1])], #use to reduce single feature dependence?
#     monotone_constraints=[1 for i in range(X.shape[1] - 1)] + [-1], #use for expert knowledge. I used in my temperature vs gas use case. temp -> gas monotonically decreasing.
)

To me, however, there seems to be two relatively simple features that could be added to facilitate this further.

  1. The addition of a Constant Term.
  2. Schedulable Feature Cycling.

The Constant Term

This would allow the model to remove the extra/redundant initial "value"* from the first set of splits on the first feature. This is the same as shifting your y-variable (e.g. lgb.fit(X, y - C), but until the initial run, you do not know what C is.

Schedulable Feature Cycling

This could be done in many ways, but essentially, all that is needed is the ability to cycle through each feature in-turn. With the current model params this is done at random. Combined with the above issue, this means that any features disproportionately picked at random in the first few trees would have an artificially inflated "value"*.

Hopefully it is clear how this would help improve the understanding of how an LGBM model is behaving.

This is aimed at being an ongoing discussion, so please chime in. Any questions, please ask!!

Motivation

  1. Increase LGBM interpretability.
  2. Allow more direct comparisons of results/functionality.
  3. Borrow other idea from EBMs.
  4. Open discussion on how to do this.

References

*"value" as defined in the table generated by: tree = lgb.booster_.trees_to_dataframe().
InterpretML: A Unified Framework for Machine Learning Interpretability
InterpretML: A toolkit for understanding machine learning models
InterpretMLs Explainable Boosting Machine

@StrikerRUS StrikerRUS changed the title Bringing LightGBM inline with Explainable Boosted Machines Investigate possibility of borrowing some features from Explainable Boosted Machines Mar 7, 2021
@StrikerRUS
Copy link
Collaborator

Closed in favor of being in #2302. We decided to keep all feature requests in one place.

Welcome to contribute this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature.

@StrikerRUS StrikerRUS changed the title Investigate possibility of borrowing some features from Explainable Boosted Machines Investigate possibility of borrowing some features/ideas from Explainable Boosted Machines Mar 7, 2021
@JoshuaC3
Copy link
Author

JoshuaC3 commented Mar 8, 2021

The Constant Term
This would allow the model to remove the extra/redundant initial "value"* from the first set of splits on the first feature. This is the same as shifting your y-variable (e.g. lgb.fit(X, y - C), but until the initial run, you do not know what C is.

This constant term is simply the mean of y in EBMs. Is this just setting init_score to be the mean of y for all observations?

@JoshuaC3
Copy link
Author

JoshuaC3 commented Mar 8, 2021

Schedulable Feature Cycling
This could be done in many ways, but essentially, all that is needed is the ability to cycle through each feature in-turn. With the current model params this is done at random. Combined with the above issue, this means that any features disproportionately picked at random in the first few trees would have an artificially inflated "value"*.

@StrikerRUS this is conceptually a very easy thing to implement, though imagine it to be at the internal level (C code level). It would simply require a parameter, lets say feature_sampling, and the options to be random or cyclic.

Can I add this as a separate feature request and then add it under the New features section? I feel it would be incredibly easy to implement so being under New Algorithm makes it seem a bit too big a task.

@StrikerRUS
Copy link
Collaborator

@JoshuaC3

This constant term is simply the mean of y in EBMs.

For mean value I think boost_from_average param should help. For any other custom values init_score can be used, as you correctly mentioned.

Can I add this as a separate feature request and then add it under the New features section? I feel it would be incredibly easy to implement so being under New Algorithm makes it seem a bit too big a task.

Sure! As you are quite familiar with EBM, feel free to split this big issue into multiple smaller separate feature requests which will be self-contained and will not require a lot of efforts for writing new code. I believe it will help to evolve more people into improvement process of LightGBM.

@JoshuaC3
Copy link
Author

@StrikerRUS Thanks! I will do this.

In Python, how do you return the init_score after the model is trained? I cannot find any functions or attributes for it. Thanks!!

@StrikerRUS
Copy link
Collaborator

Do you mean something like this?

import numpy as np
import lightgbm as lgb

from sklearn.datasets import load_boston

X, y = load_boston(return_X_y=True)
lgb_data = lgb.Dataset(X, y, init_score=np.ones_like(y))
lgb_data.get_init_score()

In LightGBM init_score is tight with Dataset object.

@JoshuaC3
Copy link
Author

Precisely that. Thanks. I was looking on the Booster class and via the sklearn api.

I will add "Expose the get_init_score to the Booster class" as a feature request and also expose it in the sklearn api too. I think I might be able to do this as it all looks like pythons all the way up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants