Data analysis and modeling for churning with a telecom dataset

A full presentation is available including pre-processing, modeling details and results, and a discussion of how to reduce the churn rate.

Features relation maps after pre-processing

There were 7032 clients each with 32 features after the pre-processing step (20 originally). The relationship among the original features are illustrated here

Modeling

Models were built to predict the label of churning. To compare and find the best model, Logistic regression, SVM, GBDT, and neural network were used. The data was split with a training-to-testing ratio of 4:1, and features were normalized with respect to the training set. Parameters tuning for each model was done with a stratified, 5-Fold CV Grid-Search. GBDT using LGBoost was found to perform the best with an accuracy of 80.24%

Guide to the notebooks

A telecom client dataset was explored and features engineered in notebook 01. Then the feature set was used to fit 7 models, including 4 variations of decision trees, in notebook 02. The best model was selected by comparing various scores' values, and its dependence on the features was plotted in the feature importance chart. Finally, the resulting model was studied to plan for actions to lower the churn rate in notebook 03.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
images		images
report		report
.gitignore		.gitignore
01.EDA_and_FeatureEngineering.ipynb		01.EDA_and_FeatureEngineering.ipynb
02.Modeling_and_FeatureImportance.ipynb		02.Modeling_and_FeatureImportance.ipynb
03.Action_For_Churn_Reduction.ipynb		03.Action_For_Churn_Reduction.ipynb
README.md		README.md
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data analysis and modeling for churning with a telecom dataset

Features relation maps after pre-processing

Modeling

Guide to the notebooks

About

Releases

Packages

Languages

rmwkwok/telecom_analysis

Folders and files

Latest commit

History

Repository files navigation

Data analysis and modeling for churning with a telecom dataset

Features relation maps after pre-processing

Modeling

Guide to the notebooks

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages