Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can ExplainableBoostingRegressor be used as a feature engineering tool? #474

Open
jckkvs opened this issue Aug 29, 2023 · 4 comments
Open

Comments

@jckkvs
Copy link

jckkvs commented Aug 29, 2023

Can ExplainableBoostingRegressor or ExplainableBoostingClassifier be used as a transformer to extract interactions between explanatory variables? For example, considering the interaction between X1 and X3.

@paulbkoch
Copy link
Collaborator

Hi @jckkvs -- The ExplainableBoostingClassifier and ExplainableBoostingRegressor are not transformers, so you cannot do this directly, but you can achieve it through other means.

The hard part about interactions is narrowing the possible interactions down to just the most important ones. We have exposed the "measure_interactions" function which returns an interaction strength that we use internally to choose pairs. Here's a link to the docs on measure_interactions: https://interpret.ml/docs/measure_interactions.html.
Generally, you would want to first train a model on the individual features to extract as much information from them as possible before moving to the pairs. measure_interactions accepts an init_score parameter for this.

Once the interactions are chosen, we then bin them using quantiles. You can use the same binning algorithm as used in EBMs by using the EBMPreprocessor. It isn't public, but you can find it here:

class EBMPreprocessor(BaseEstimator, TransformerMixin):

@jckkvs
Copy link
Author

jckkvs commented Sep 26, 2023

@paulbkoch
Thank you. I apologize for the vagueness of my question due to my lack of understanding.

In my previous questions, I mentioned interactions between variables, but for a simpler case where interactions are not considered, I've come to understand the following:

The EBMPreprocessor(binning="quantile") that you mentioned has the effect of transforming various distribution shapes of X into a uniform distribution.

I will deepen my understanding of interactions by reviewing the documentation and code you provided. Thank you.

@jckkvs
Copy link
Author

jckkvs commented Sep 27, 2023

@paulbkoch

I understand that the EBMPreprocessor(binning="quantile") essentially performs the same function as sklearn.preprocessing.QuantileTransformer(output_distribution='uniform'). (Of course, I anticipate some minor differences depending on the settings of n_quantiles and the dataset in use.)

Is my understanding correct?"

Below is a simple code to verify the above understanding.

from  sklearn.preprocessing import QuantileTransformer
from interpret.utils._preprocessor import EBMPreprocessor
from sklearn.datasets import make_regression
import matplotlib.pyplot as plt
X,y = make_regression()
transformers = [QuantileTransformer(), EBMPreprocessor()]
X_transformed_ = []
for transformer in transformers:
    transformer.fit(X,y)
    X_transformed = transformer.transform(X)
    X_transformed_.append(X_transformed)

plt.scatter(X_transformed_[0],X_transformed_[1])

@paulbkoch
Copy link
Collaborator

Same idea in terms of quantiles, although QuantileTransformer returns floats and EBMPreprocessor returns binned integer values, and also handles missing values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants