Problem Statement

This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.

Dataset

The dataset used is the diabetes.csv (https://www.kaggle.com/uciml/pima-indians-diabetes-database).

The 2 class labels are:

1. Diabetic person: Person having diabetes.
2. Non-Diabetic person: Person not having diabetes.

Model(s) Used

1. Support Vector Machine(SVM): Support Vector Machine(SVM) is a supervised machine learning algorithm used for classification. The objective of SVM algorithm is to find a hyperplane in an N-dimensional space that distinctly classifies the data points. The dimension of the hyperplane depends upon the number of features.

2. Naive Bayes: Naive Bayes classifiers are a collection of classification algorithms based on Bayes’ Theorem. It is not a single algorithm but a family of algorithms where all of them share a common principle, i.e. every pair of features being classified is independent of each other.

3. K-Nearest Neighbours: K-Nearest Neighbours is one of the most basic yet essential classification algorithms in Machine Learning. It belongs to the supervised learning domain and finds intense application in pattern recognition, data mining and intrusion detection.

4. Logistic Regression: Logistic regression predicts the output of a categorical dependent variable. Therefore the outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as 0 and 1, it gives the probabilistic values which lie between 0 and 1.

5. Random Forest: Random forest consists of a large number of individual decision trees that operate as an ensemble. Each individual tree in the random forest spits out a class prediction and the class with the most votes becomes our model’s prediction.

6. Decision Tree: A Decision tree is a flowchart like tree structure, where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node holds a class label.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
models		models
notebooks		notebooks
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Problem Statement

Dataset

Model(s) Used

About

Releases

Packages

Languages

VictorChazhoor/Diabetes-prediction-ML-classification

Folders and files

Latest commit

History

Repository files navigation

Problem Statement

Dataset

Model(s) Used

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages