Skip to content

Exploratory Analysis, Data Cleaning and Machine Learning modelling using Pandas, Matplotlib and Sklearn.

License

Notifications You must be signed in to change notification settings

manningb/covid-19-cdc-machine-learning-data-analysis

Repository files navigation

Contributors Forks Stargazers Issues MIT License LinkedIn


Logo

Covid-19 CDC Exploratory Data Analysis

Exploratory Analysis, Data Cleaning and Machine Learning modelling using Pandas, Matplotlib and Sklearn.
Explore the docs »

Exploratory Analysis Notebook · Machine Learning Notebook · Report Bug · Request Feature

Table of Contents

  1. About The Project
  2. Getting Started
  3. Usage
  4. Contributing
  5. License
  6. Contact

About The Project

In this main directory you will find the following:

  • 1_Homework1: This contains my corrections from Homework 1, I have added in a number of new features here which are then used throughout Homework 2. Most of this file is same apart from the very last section where the new features are added and evaluated using correlation coefficients
  • 2_Homework2: This contains my work for Homework 2. - data: - covid19-cdc-17324576-clean-new-features.csv - Used from the start of Homework 2 to test Linear Regression, Logistic Regression and Random Forests - covid19-cdc-17324576-for-part-5.csv - Used for Part 5 to evaluate if more features would be better - it does not contain the feature scaling for time since start used in the above dataset - 24032021-covid19-cdc-deathyn-recent-10k.csv - New dataset for testing part 5 - trees: Contains various tree files for presentation in the notebook. - confusion_matrix.py: This was adapted from code found online to quickly generate a confusion matrix and metrics based on predicted and actual data - 2_2020-Homework2-Notebook.ipynb: Homework 2 notebook - pickle: Pickle Randomised Search results

Built With

Getting Started

To get a local copy up and running follow these simple steps.

Installation

  1. Clone the repo
    git clone https://github.com/manningb/covid-19-cdc-machine-learning-data-analysis.git
  2. Install Python packages
    pip install -r requirements.txt

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Brian Manning - manningbrian98@gmail.com

Project Link: https://github.com/manningb/covid-19-cdc-machine-learning-data-analysis

About

Exploratory Analysis, Data Cleaning and Machine Learning modelling using Pandas, Matplotlib and Sklearn.

Topics

Resources

License

Stars

Watchers

Forks