Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
jphall663 committed Aug 11, 2020
1 parent 4151cad commit c59807c
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,11 +92,11 @@ These model debugging exercises uncover accuracy, drift, and security problems s

In general, residual analysis can be characterized as a careful study of when and how models make mistakes. A better understanding of mistakes will hopefully lead to fewer of them. This notebook uses variants of residual analysis to find error mechanisms and security vulnerabilities and to assess stability and fairness in a trained XGBoost model. It begins by loading the UCI credit card default data and then training an interpretable, monotonically constrained XGBoost gradient boosting machine (GBM) model. (Pearson correlation with the prediction target is used to determine the direction of the monotonicity constraints for each input variable.) After the model is trained, its logloss residuals are analyzed and explained thoroughly and the constrained GBM is compared to a benchmark linear model. These model debugging exercises uncover accuracy, drift, and security problems such as over-emphasis of important variables and strong signal in model residuals. Several remediation mechanisms are proposed including missing value injection during training, additional data collection, and use of assertions to correct known problems during scoring.

### From GLM to GBM: The Business Value of a Better Model - [Notebook](https://nbviewer.jupyter.org/github/jphall663/interpretable_machine_learning_with_python/blob/master/glm_mgbm_gbm.ipynb)
### From GLM to GBM: Building the Case For Complexity - [Notebook](https://nbviewer.jupyter.org/github/jphall663/interpretable_machine_learning_with_python/blob/master/glm_mgbm_gbm.ipynb)

![](readme_pics/hist_pd_ice_lo.png)

This notebook uses the same credit card default scenario to show how monotonicity constraints, Shapley values and other post-hoc explanations, and discrimination testing can enable practitioners to create direct comparisons between GLM and GBM models. Several candidate probability of default models are selected for comparison using feature selection methods, like LASSO, and by cross-validated ranking. Comparisons then enable building from GLM to more complex GBM models in a step-by-step manner, while retaining model transparency and the ability to test for discrimination. This notebook shows that GBMs can yield better accuracy, more revenue, and that GBM is also likely to fulfill many model documentation, adverse action notice, and discrimination testing requirements.
This notebook uses the same credit card default scenario to show how monotonicity constraints, Shapley values and other post-hoc explanations, and discrimination testing can enable practitioners to create direct comparisons between GLM and GBM models. Several candidate probability of default models are selected for comparison using feature selection methods, like LASSO, and by cross-validated ranking. Comparisons then enable building from GLM to more complex GBM models in a step-by-step manner, while retaining model transparency and the ability to test for discrimination. This notebook shows that GBMs can yield better accuracy, more revenue, and that GBMs are also likely to fulfill many model documentation, adverse action notice, and discrimination testing requirements.

## Using the Examples

Expand Down

0 comments on commit c59807c

Please sign in to comment.