Can ExplainableBoostingRegressor be used as a feature engineering tool? #474

jckkvs · 2023-08-29T00:19:40Z

Can ExplainableBoostingRegressor or ExplainableBoostingClassifier be used as a transformer to extract interactions between explanatory variables? For example, considering the interaction between X1 and X3.

paulbkoch · 2023-08-30T06:58:32Z

Hi @jckkvs -- The ExplainableBoostingClassifier and ExplainableBoostingRegressor are not transformers, so you cannot do this directly, but you can achieve it through other means.

The hard part about interactions is narrowing the possible interactions down to just the most important ones. We have exposed the "measure_interactions" function which returns an interaction strength that we use internally to choose pairs. Here's a link to the docs on measure_interactions: https://interpret.ml/docs/measure_interactions.html.
Generally, you would want to first train a model on the individual features to extract as much information from them as possible before moving to the pairs. measure_interactions accepts an init_score parameter for this.

Once the interactions are chosen, we then bin them using quantiles. You can use the same binning algorithm as used in EBMs by using the EBMPreprocessor. It isn't public, but you can find it here:

interpret/python/interpret-core/interpret/utils/_preprocessor.py

Line 74 in e6f38ea

class EBMPreprocessor(BaseEstimator, TransformerMixin):

jckkvs · 2023-09-26T04:51:17Z

@paulbkoch
Thank you. I apologize for the vagueness of my question due to my lack of understanding.

In my previous questions, I mentioned interactions between variables, but for a simpler case where interactions are not considered, I've come to understand the following:

The EBMPreprocessor(binning="quantile") that you mentioned has the effect of transforming various distribution shapes of X into a uniform distribution.

I will deepen my understanding of interactions by reviewing the documentation and code you provided. Thank you.

jckkvs · 2023-09-27T00:52:34Z

@paulbkoch

I understand that the EBMPreprocessor(binning="quantile") essentially performs the same function as sklearn.preprocessing.QuantileTransformer(output_distribution='uniform'). (Of course, I anticipate some minor differences depending on the settings of n_quantiles and the dataset in use.)

Is my understanding correct?"

Below is a simple code to verify the above understanding.

from  sklearn.preprocessing import QuantileTransformer
from interpret.utils._preprocessor import EBMPreprocessor
from sklearn.datasets import make_regression
import matplotlib.pyplot as plt
X,y = make_regression()
transformers = [QuantileTransformer(), EBMPreprocessor()]
X_transformed_ = []
for transformer in transformers:
    transformer.fit(X,y)
    X_transformed = transformer.transform(X)
    X_transformed_.append(X_transformed)

plt.scatter(X_transformed_[0],X_transformed_[1])

paulbkoch · 2023-09-27T08:59:45Z

Same idea in terms of quantiles, although QuantileTransformer returns floats and EBMPreprocessor returns binned integer values, and also handles missing values.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can ExplainableBoostingRegressor be used as a feature engineering tool? #474

Can ExplainableBoostingRegressor be used as a feature engineering tool? #474

jckkvs commented Aug 29, 2023

paulbkoch commented Aug 30, 2023

jckkvs commented Sep 26, 2023

jckkvs commented Sep 27, 2023 •

edited

Loading

paulbkoch commented Sep 27, 2023

Can ExplainableBoostingRegressor be used as a feature engineering tool? #474

Can ExplainableBoostingRegressor be used as a feature engineering tool? #474

Comments

jckkvs commented Aug 29, 2023

paulbkoch commented Aug 30, 2023

jckkvs commented Sep 26, 2023

jckkvs commented Sep 27, 2023 • edited Loading

paulbkoch commented Sep 27, 2023

jckkvs commented Sep 27, 2023 •

edited

Loading