GitHub - Narius2030/Diabetes-Analyzing-with-R: Solving three main problems in diabetes dataset: predict, classification and theory validation

General Information

Referencing to the published project on Rpubs: diabetes-analyzing-ml

Overall of dataset: women's medical and demographic data to predict diabetes

This dataset contains information on 769 women and includes many health-related attributes. Here is a brief overview of the columns:

Pregnancy: The number of times a woman has been pregnant.
Glucose: The concentration of glucose in a woman's plasma.
Blood pressure: Measure blood pressure.
Skin thickness: The thickness of the skin folds in the triceps.
Insulin: Insulin concentration in the blood.
BMI (Body Mass Index): A measure of body fat based on height and weight.
Diabetes pedigree function: A function that shows the likelihood of developing diabetes based on family history.
Age: Age of the woman.
Outcome: The target variable indicates whether the woman has diabetes (1 for diabetics, 0 for non-diabetics).

Problem Solving

👨‍🏫 Exploring the dataset and Pre-processing

Describing the most overall vision for reader to comprehend what exactly this dataset's structure is
Utilizing some legible visualization techniques for plotting out the significant features of dataset
Identifying any abnormal things in dataset, such as null/nan data points or outliers, which will affect incorrectly in analyzing process

📊 Establishing the prediction model with Logistic Regression and Decision Tree

This problem means to forecast whether the patient got diabetes or not by lying the feature attributes, which have strong correlations with the Outcome variables
Observing generally the dataset to define which attributes are not necessary for these problems. Then, we will remove them before construct the machine learning models
Comparing the performance and accuracy of the two models and making a conclusion which one is better

🗂 Classifying the categories of mass using Random Forest model

The problem serves for identifying the mass situation of patient such as underweight, normal, overweight and obese. It will be helpful for doctors can keep track the health of patient having a probability of diabetes
Observing generally the dataset to define which attributes are not necessary for these problems. Then, we will remove them before construct the models
Having some fine-tuning tasks for selecting the best values' parameters. Then, we can build as the best model as possible base on these fine-tuned parameters

🕵️‍♀️ Hypothesis validation using T-Test technique

Using One-sample T-test, hypothesis that an average BMI (Body Mass Index) of 34 is susceptible to diabetes
Using Independent Samples T-test, The hypothesis that body fat (BMI) does not affect whether or not there is disease
Using One-sample T-test, hypothesis that the age also affects whether a person has diabetes

Technology

Environment: Rstudio, R interpreter
Display mode: R-Markdown or R-Notebook
Packages:
- glm for logistic regression
- rpart for decision tree model
- randomForest for random forest models,

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
renv		renv
rsconnect/documents		rsconnect/documents
.RData		.RData
.Rhistory		.Rhistory
.Rprofile		.Rprofile
.gitignore		.gitignore
EDA.Rmd		EDA.Rmd
EDA.html		EDA.html
Nhom7.Rmd		Nhom7.Rmd
Nhom7.html		Nhom7.html
PJCuoiKi.Rproj		PJCuoiKi.Rproj
README.md		README.md
diabetes.csv		diabetes.csv
renv.lock		renv.lock
styles.css		styles.css

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of contents

General Information

Referencing to the published project on Rpubs: diabetes-analyzing-ml

Overall of dataset: women's medical and demographic data to predict diabetes

Problem Solving

👨‍🏫 Exploring the dataset and Pre-processing

📊 Establishing the prediction model with Logistic Regression and Decision Tree

🗂 Classifying the categories of mass using Random Forest model

🕵️‍♀️ Hypothesis validation using T-Test technique

Technology

About

Releases

Packages

Languages

Narius2030/Diabetes-Analyzing-with-R

Folders and files

Latest commit

History

Repository files navigation

Table of contents

General Information

Referencing to the published project on Rpubs: diabetes-analyzing-ml

Overall of dataset: women's medical and demographic data to predict diabetes

Problem Solving

👨‍🏫 Exploring the dataset and Pre-processing

📊 Establishing the prediction model with Logistic Regression and Decision Tree

🗂 Classifying the categories of mass using Random Forest model

🕵️‍♀️ Hypothesis validation using T-Test technique

Technology

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages