Welcome to the Breast Cancer Classification project! This project is a part of the GirlScript Summer of Code program. The objective of this project is to build a machine learning model using Logistic Regression to classify whether a given tumor is malignant or benign based on various features.
Breast cancer is one of the most common cancers among women worldwide. Early detection and treatment are crucial for improving survival rates. Machine learning can aid in the early detection of breast cancer by classifying tumors as malignant or benign based on various features.
In this project, we will:
- Load and preprocess the breast cancer dataset.
- Train a Logistic Regression model on the dataset.
- Evaluate the performance of the model.
- Provide usage instructions for making predictions on new data.
To get started with this project, you'll need to have Python installed on your machine. You can install the necessary dependencies using pip:
pip install -r requirements.txt
The dataset used in this project is the Breast Cancer Wisconsin (Diagnostic) Data Set. It contains features computed from a digitized image of a fine needle aspirate (FNA) of a breast mass.
- Features: 30 numeric features representing characteristics of the cell nuclei.
- Target: Binary classification (malignant or benign).
Data preprocessing steps include:
- Loading the dataset.
- Handling missing values (if any).
- Encoding categorical variables (if any).
- Scaling the features.
We will use Logistic Regression for the classification task. Logistic Regression is a simple yet powerful algorithm for binary classification problems. The steps for training the model are as follows:
- Split the dataset into training and testing sets.
- Initialize the Logistic Regression model.
- Train the model on the training data.
- Tune hyperparameters if necessary.
To evaluate the performance of the model, we will use metrics such as:
- Accuracy
- Precision
- Recall
- F1-Score
Confusion matrix and ROC-AUC curve will also be plotted for a comprehensive evaluation.
To use the trained model for making predictions on new data, follow these steps:
- Load the pre-trained model from the saved file.
- Prepare the input data in the same format as the training data.
- Use the model to make predictions.
Example code:
import pickle
import numpy as np
# Load the pre-trained model
with open('breast_cancer_model.pkl', 'rb') as file:
model = pickle.load(file)
# Sample input data (replace with actual data)
input_data = np.array([[...]])
# Make predictions
predictions = model.predict(input_data)
print(predictions)
This project is licensed under the MIT License. See the LICENSE file for details.