Can I limit magnitude of feature curves #122

jamie-murdoch · 2020-04-14T20:30:28Z

I'm fitting an EBM multiclass classifier, and am getting feature curves with values in excess of 10^10. Is there a way to fix this somehow? Ideally I could specify a maximum value, and clip the feature curves beyond that.

For more context, in my particular dataset, there is an interval of ~2% of the data where if, say, X_1 is less than -10 the output is guaranteed to be class 0. The curve is taking values -10^10 in this region, and 10^8 everywhere else, with an intercept around 10^8.

This certainly makes sense from a prediction perspective, but for interpretation purposes, having 98% of the feature importances for that variable be 10^8 isn't ideal.

jamie-murdoch · 2020-04-14T20:36:28Z

For additional context, I just tried a binary, one vs all, EBM classifier for the affected class, and the computed tree values are all reasonable (absolute value less than 3), so this is probably a result of the "experimental" multiclass classification.

interpret-ml · 2020-04-22T16:42:53Z

Hi @jamie-murdoch, thanks for raising this issue. It's a reasonable feature suggestion, and one that we've been thinking on our side as well. It's good to know it would be useful for you, and that the scores are particularly extreme when you frame your problem as a multiclass classification problem.

For now, here's some light code to post-process "clip" the magnitude of the graphs for each feature. All the graphs are stored as simple numpy arrays inside of the attribute_set_models_ property. Manipulating these numpy arrays changes the graphs rendered using the explain_global() call, and also modifies the predictions made by the model in the future.

Here's how you would "edit the model" by clipping the scores :

import numpy as np

min_score = -5
max_score = 5
for index in range(len(ebm.attribute_set_models_)):
    ebm.attribute_set_models_[index] = np.clip(ebm.attribute_set_models_[index], min_score, max_score)

In this example, all model graphs would be clipped to follow the range [-5, 5]. The only caveat is that the overall feature importance graph (shown as the "Summary" view) is pre-calculated when the model is trained, so it wouldn't be updated by this post processing.

Let us know if you'd like some code for re-calculating the overall feature importances, or need any other help with the code!

-InterpretML Team

jamie-murdoch · 2020-04-22T17:40:15Z

Thanks for the follow up!

So, I had a 10-class problem and ended up training 10 one vs all classifiers, and aggregating them. A bit gnarly, but the 0-1 accuracy was nearly as good (54-56%), and the curve magnitudes were reasonable.

For the benefit of future readers, I suspect using the code above may require some shifting of the intercepts (the intercept in question was ~10^8), and/or refitting, since the model had learned that 4% of feature values were -10^10 important, and other 10^8, so squishing to +/- 5 would lose that signal.

andro536 · 2020-10-29T15:51:28Z

Hi!
I am fitting an ExplainableBoostingClassifier and get some spikes in the model that I want to remove:

However, if I try to use the method above to access and edit the model I get the following error:
AttributeError: 'ExplainableBoostingClassifier' object has no attribute 'attribute_set_models_'

Is there another way to do this?
Many thanks!

interpret-ml · 2020-10-30T16:33:52Z

Hi @andro536 ,

The 0.2.0 release of interpret had a few breaking changes which included some attribute renames. The new name of the property is additive_terms_ -- just substituting that in place of attribute_set_models_ should make the code posted above work.

import numpy as np

min_score = -5
max_score = 5
for index in range(len(ebm.additive_terms_)):
    ebm.additive_terms_[index] = np.clip(ebm.additive_terms_[index], min_score, max_score)

Note that this clips all values on all graphs to a min/max range of 5. In your case, you may want to inspect and edit the graph for just this single feature ebm.additive_terms_[feature_index]. Sometimes large spikes like this can also be indicators of data errors or leakage, so it might be worth investigating the datapoints where var_6 has this abnormally high prediction value.

We're looking into introducing a better API in the future for model/graph editing, but manipulating the additive_terms_ array is the best way to do so right now. Hope this helps!

-InterpretML Team

candalfigomoro · 2021-08-05T08:51:46Z

Let us know if you'd like some code for re-calculating the overall feature importances, or need any other help with the code!

-InterpretML Team

How do we re-calculate the overall feature importances? Thanks!

interpret-ml · 2021-08-17T21:17:27Z

Hi @candalfigomoro,

The overall feature importances are simply calculated as a mean absolute contribution per feature on the training dataset. Our code for calculating them is just a few lines of Python here:

interpret/python/interpret-core/interpret/glassbox/ebm/ebm.py

Lines 1190 to 1202 in 2676a0f

    
           self.feature_importances_ = [] 
        
           if isinstance(self, (DPExplainableBoostingClassifier, DPExplainableBoostingRegressor)): 
        
               # DP method of generating feature importances can generalize to non-dp if preprocessors start tracking joint distributions 
        
               for i in range(len(self.feature_groups_)): 
        
                   mean_abs_score = np.average(np.abs(self.additive_terms_[i]), weights=self.preprocessor_.col_bin_counts_[i]) 
        
                   self.feature_importances_.append(mean_abs_score) 
        
           else: 
        
               scores_gen = EBMUtils.scores_by_feature_group( 
        
                   X, X_pair, self.feature_groups_, self.additive_terms_ 
        
               ) 
        
               for set_idx, _, scores in scores_gen: 
        
                   mean_abs_score = np.mean(np.abs(scores)) 
        
                   self.feature_importances_.append(mean_abs_score)

You can import the scores_by_feature_group function from the EBMUtils class in interpret.glassbox.ebm.utils, and generate the inputs X and X_pair by using the ebm.preprocessor_ and ebm.pair_preprocessor_ attributes on the fitted EBM model respectively. Feel free to reply back if you have further questions!

-InterpretML Team

paulbkoch · 2023-01-22T08:08:02Z

In the latest 0.3.0 version, feature/term importances are calculated when explanations are generated. This means that any changes made to the model after it has been trained will be reflected in the importances, eliminating the need to recalculate them. The creation of a post processing clipping utility function remains on our backlog since it would be nice to get both clipping and re-centering operations in a single function.

paulbkoch · 2023-03-21T19:06:19Z

I've changed my perspective somewhat on this issue. Today the best solution is to do the post-processing clipping as described above, however there is a better long-term solution that should be implemented instead of a clipping utility. The fundamental issue we have today is that our boosting is of the MART variety instead of LogitBoost. What this means is that we calculate updates using hessians, but we calculate the gain from the gradient and the count of samples within each potential leaf. If all the samples within a potential leaf are positive or negative examples, then we get into trouble since boosting can keep pushing the scores towards either +infinity or -infinity since we only store the total number of samples and not the per-class counts. If we used LogitBoost instead of MART, we could implement a min_child_weight parameter like XGBoost and LightGBM have. Setting min_child_weight to something non-zero would disallow leaf nodes that are pure, which should eliminate this issue and also improve the models in these scenarios.

For multiclass, there's an additional requirement that the trees are built per-class instead of jointly. If there's a minority class, then we don't want to disallow growth at the tail ends of each feature due to the minority class potentially not having any samples in those regions. Both XGBoost and LightGBM build their trees per-class after calculating the gradients and hessians, and I suspect this is the reason they do it this way.

interpretml deleted a comment from rodrigovssp May 31, 2021

paulbkoch mentioned this issue Jan 22, 2023

Backlog #400

Open

paulbkoch added the enhancement New feature or request label Jan 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I limit magnitude of feature curves #122

Can I limit magnitude of feature curves #122

jamie-murdoch commented Apr 14, 2020

jamie-murdoch commented Apr 14, 2020

interpret-ml commented Apr 22, 2020

jamie-murdoch commented Apr 22, 2020

andro536 commented Oct 29, 2020

interpret-ml commented Oct 30, 2020

candalfigomoro commented Aug 5, 2021

interpret-ml commented Aug 17, 2021

paulbkoch commented Jan 22, 2023

paulbkoch commented Mar 21, 2023

Can I limit magnitude of feature curves #122

Can I limit magnitude of feature curves #122

Comments

jamie-murdoch commented Apr 14, 2020

jamie-murdoch commented Apr 14, 2020

interpret-ml commented Apr 22, 2020

jamie-murdoch commented Apr 22, 2020

andro536 commented Oct 29, 2020

interpret-ml commented Oct 30, 2020

candalfigomoro commented Aug 5, 2021

interpret-ml commented Aug 17, 2021

paulbkoch commented Jan 22, 2023

paulbkoch commented Mar 21, 2023