-
Notifications
You must be signed in to change notification settings - Fork 725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can I limit magnitude of feature curves #122
Comments
For additional context, I just tried a binary, one vs all, EBM classifier for the affected class, and the computed tree values are all reasonable (absolute value less than 3), so this is probably a result of the "experimental" multiclass classification. |
Hi @jamie-murdoch, thanks for raising this issue. It's a reasonable feature suggestion, and one that we've been thinking on our side as well. It's good to know it would be useful for you, and that the scores are particularly extreme when you frame your problem as a multiclass classification problem. For now, here's some light code to post-process "clip" the magnitude of the graphs for each feature. All the graphs are stored as simple numpy arrays inside of the Here's how you would "edit the model" by clipping the scores : import numpy as np
min_score = -5
max_score = 5
for index in range(len(ebm.attribute_set_models_)):
ebm.attribute_set_models_[index] = np.clip(ebm.attribute_set_models_[index], min_score, max_score) In this example, all model graphs would be clipped to follow the range [-5, 5]. The only caveat is that the overall feature importance graph (shown as the "Summary" view) is pre-calculated when the model is trained, so it wouldn't be updated by this post processing. Let us know if you'd like some code for re-calculating the overall feature importances, or need any other help with the code! -InterpretML Team |
Thanks for the follow up! So, I had a 10-class problem and ended up training 10 one vs all classifiers, and aggregating them. A bit gnarly, but the 0-1 accuracy was nearly as good (54-56%), and the curve magnitudes were reasonable. For the benefit of future readers, I suspect using the code above may require some shifting of the intercepts (the intercept in question was ~10^8), and/or refitting, since the model had learned that 4% of feature values were -10^10 important, and other 10^8, so squishing to +/- 5 would lose that signal. |
Hi @andro536 , The 0.2.0 release of interpret had a few breaking changes which included some attribute renames. The new name of the property is import numpy as np
min_score = -5
max_score = 5
for index in range(len(ebm.additive_terms_)):
ebm.additive_terms_[index] = np.clip(ebm.additive_terms_[index], min_score, max_score) Note that this clips all values on all graphs to a min/max range of 5. In your case, you may want to inspect and edit the graph for just this single feature We're looking into introducing a better API in the future for model/graph editing, but manipulating the -InterpretML Team |
How do we re-calculate the overall feature importances? Thanks! |
Hi @candalfigomoro, The overall feature importances are simply calculated as a mean absolute contribution per feature on the training dataset. Our code for calculating them is just a few lines of Python here: interpret/python/interpret-core/interpret/glassbox/ebm/ebm.py Lines 1190 to 1202 in 2676a0f
You can import the -InterpretML Team |
In the latest 0.3.0 version, feature/term importances are calculated when explanations are generated. This means that any changes made to the model after it has been trained will be reflected in the importances, eliminating the need to recalculate them. The creation of a post processing clipping utility function remains on our backlog since it would be nice to get both clipping and re-centering operations in a single function. |
I've changed my perspective somewhat on this issue. Today the best solution is to do the post-processing clipping as described above, however there is a better long-term solution that should be implemented instead of a clipping utility. The fundamental issue we have today is that our boosting is of the MART variety instead of LogitBoost. What this means is that we calculate updates using hessians, but we calculate the gain from the gradient and the count of samples within each potential leaf. If all the samples within a potential leaf are positive or negative examples, then we get into trouble since boosting can keep pushing the scores towards either +infinity or -infinity since we only store the total number of samples and not the per-class counts. If we used LogitBoost instead of MART, we could implement a min_child_weight parameter like XGBoost and LightGBM have. Setting min_child_weight to something non-zero would disallow leaf nodes that are pure, which should eliminate this issue and also improve the models in these scenarios. For multiclass, there's an additional requirement that the trees are built per-class instead of jointly. If there's a minority class, then we don't want to disallow growth at the tail ends of each feature due to the minority class potentially not having any samples in those regions. Both XGBoost and LightGBM build their trees per-class after calculating the gradients and hessians, and I suspect this is the reason they do it this way. |
I'm fitting an EBM multiclass classifier, and am getting feature curves with values in excess of 10^10. Is there a way to fix this somehow? Ideally I could specify a maximum value, and clip the feature curves beyond that.
For more context, in my particular dataset, there is an interval of ~2% of the data where if, say, X_1 is less than -10 the output is guaranteed to be class 0. The curve is taking values -10^10 in this region, and 10^8 everywhere else, with an intercept around 10^8.
This certainly makes sense from a prediction perspective, but for interpretation purposes, having 98% of the feature importances for that variable be 10^8 isn't ideal.
The text was updated successfully, but these errors were encountered: