Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-package] LGBMRegressor.predict(...,pred_contrib=True) does not average contribution from individual trees when boosting_type='rf' #6217

Open
trendelkampschroer opened this issue Nov 28, 2023 · 1 comment
Labels

Comments

@trendelkampschroer
Copy link

trendelkampschroer commented Nov 28, 2023

Description

For a random forest model contributions are not averaged across individual trees.

Below you can see that the contributions (plus expectation) sum to the raw prediction (sum of predictions from trees in the random forest) but not to the average of predictions from trees in the random forest.

Reproducible example

n_samples = 1000
X, y = sklearn.datasets.make_regression(n_samples=n_samples, n_features=3, random_state=42)
model = lightgbm.LGBMRegressor(boosting_type="rf", n_estimators=10, colsample_bytree=0.5)
model.fit(X, y)
>>>[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000308 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 765
[LightGBM] [Info] Number of data points in the train set: 1000, number of used features: 3
[LightGBM] [Info] Start training from score 5.136342

X = X[0:3, :]
y_hat = model.predict(X)
z_hat = model.predict(X, raw_score=True)
phi = model.predict(X, pred_contrib=True)
print(f"Prediction {y_hat=}")
>>> Prediction y_hat=array([-113.44588556,  -86.95479007,  124.66706467])
print(f"Raw prediction {z_hat=}")
>>> Raw prediction z_hat=array([-1134.45885557,  -869.54790071,  1246.67064666])
print(f"Sum of SHAP values and expectation {phi.sum(axis=1)}")
>>> Sum of SHAP values and expectation [-1134.45885557  -869.54790071  1246.67064666]

Environment info

LightGBM version or commit hash:

Command(s) you used to install LightGBM

conda install lightgbm~=4.0
@jameslamb jameslamb changed the title LGBMRegressor.predict(...,pred_contrib=True) does not average contribution from individual trees when boosting_type='rf' [python-package] LGBMRegressor.predict(...,pred_contrib=True) does not average contribution from individual trees when boosting_type='rf' Nov 28, 2023
@trendelkampschroer
Copy link
Author

@jameslamb thanks a lot for updating the issue title and triaging the issue. I don't think this is merely a usage question, but a bug. Compare e.g. https://github.com/shap/shap/blob/4fa04f89e00b54ac649a86b755873c953c208e3f/shap/explainers/_tree.py#L405
in the SHAP package where pred_contrib=True is used to compute SHAP values and for a random forest model the computed values will be wrong, in the sense that the sum of expectation and SHAP values will not be equal to the prediction.

The documentation at https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.Booster.html#lightgbm.Booster.predict
does also suggest that I can get the actual SHAP values for a random forest model using pred_contrib=True.

A possibly related issue is also documented here, shap/shap#669.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants