You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is not an issue per se, but rather a suggestion to add to the FAQ or example usage in the docs. In short, we submitted a recent issue to Microsoft's SynapseML library to get Spark support for distributing scoring with EBM models. The fix came quick and you can now deploy EBM models via Spark. The basic idea is to convert to ONNX (via ebm2onnx) and then bring that into Spark via SynapseML. I think this would be useful to call out somewhere in the docs. There's a minimal working example in the issue linked to above that could be used (which was built on top of the Adult Census example in the EBM docs).
Happy to make a PR if you feel this is useful and let me know where in the docs this would best fit.
Minimal example (but requires SynapseML to be installed):
importnumpyasnpimportpandasaspdimportonnximportebm2onnxfromsklearn.model_selectionimporttrain_test_splitfrominterpret.glassboximportExplainableBoostingClassifierfromsynapse.ml.onnximportONNXModel# Read in adult data from UCI ML repodf=pd.read_csv(
"https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data",
header=None)
df.columns= [
"Age", "WorkClass", "fnlwgt", "Education", "EducationNum",
"MaritalStatus", "Occupation", "Relationship", "Race", "Gender",
"CapitalGain", "CapitalLoss", "HoursPerWeek", "NativeCountry", "Income"
]
# Sample data and split into train/test setsseed=42np.random.seed(seed)
df=df.sample(frac=0.05, random_state=seed)
train_cols=df.columns[0:-1]
label=df.columns[-1]
X=df[train_cols]
y=df[label]
X_train, X_test, y_train, y_test=train_test_split(X, y, test_size=0.20, random_state=seed)
# Fit a (default) EBM modelebm=ExplainableBoostingClassifier()
ebm.fit(X_train, y_train)
defconvert_model(model, input_df):
onnx_model=ebm2onnx.to_onnx(
model,
ebm2onnx.get_dtype_from_pandas(input_df),
predict_proba=True
)
onnx_model.ir_version=4returnonnx_model.SerializeToString()
# Load ONNX payload into an ONNXModel and inspect inputs/outputs.payload=convert_model(ebm, input_df=X_train)
onnx_ml=ONNXModel().setModelPayload(payload)
print("Model inputs:"+str(onnx_ml.getModelInputs()))
print("Model outputs:"+str(onnx_ml.getModelOutputs()))
# Map the model input to the input dataframe's column name (FeedDict), and # map the output dataframe's column names to the model outputs (FetchDict)onnx_ml= (
onnx_ml.setDeviceType("CPU")
.setFeedDict({"input": "features"})
.setFetchDict({"probability": "probabilities", "prediction": "label"})
.setMiniBatchSize(5000)
)
# Coerce test data features to Spark DataFrame and transform (i.e., compute and add scores)X_test_sdf=spark.createDataFrame(X_test)
display(onnx_ml.transform(X_test_sdf))
The text was updated successfully, but these errors were encountered:
This is not an issue per se, but rather a suggestion to add to the FAQ or example usage in the docs. In short, we submitted a recent issue to Microsoft's SynapseML library to get Spark support for distributing scoring with EBM models. The fix came quick and you can now deploy EBM models via Spark. The basic idea is to convert to ONNX (via ebm2onnx) and then bring that into Spark via SynapseML. I think this would be useful to call out somewhere in the docs. There's a minimal working example in the issue linked to above that could be used (which was built on top of the Adult Census example in the EBM docs).
Happy to make a PR if you feel this is useful and let me know where in the docs this would best fit.
Minimal example (but requires SynapseML to be installed):
The text was updated successfully, but these errors were encountered: