Skip to content

melvinmatanos2008/Udacity-Data-Scientist-Nanodegree

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Udacity-Data-Scientist-Nanodegree

This repository contains my projects for Udacity's Data Scientist Nanodegree.

Project 1: Write a Data Science Blog Post

For this project I was interested in conducting exploratory data analysis using a Wine Review dataset found on Kaggle containing approximately 130k reviews from the Wine Enthusiast. I wanted the opportunity to explore the data and communicate my findings via a blog post on Medium which gives the reader insight into the questions posed.
Link to notebook
Link to Medium blog post

Project 2: Disaster Response Pipeline

I applied my data engineering skills to analyze disaster data from Figure Eight to build a model for an API that classifies disaster messages. I created a machine learning pipeline to categorize real messages that were sent during disaster events so that the messages could be sent to an appropriate disaster relief agency. The project includes a web app where an emergency worker can input a new message and get classification results in several categories. The web app also displays visualizations of the data.

Project 3: Recommendations with IBM

I analysed the interactions that users have with articles on the IBM Watson Studi platform and made recommendations to them about new articles I thought they'd like. I performed EDA, Rank Based Recommendations, User-user Based Collaborative Filtering, and Matrix factorisation.
Link to notebook

Project 4: Predicting Customer Churn for a Music Streaming Service

I used PySpark to predict customer churn for a music streaming service. The project involved:

  • Loading and cleaning a small subset (128MB) of a full dataset available (12GB)
  • Conducting Exploratory Data Analysis to understand the data and what features are useful for predicting churn
  • Feature Engineering to create features that will be used in the modelling process
  • Modelling using machine learning algorithms such as Logistic Regression, Random Forest, Gradient Boosted Trees, Linear SVM, Naive Bayes
    Link to notebook
    Link to blog post

Data Scientist Nanodegree Certificate

image

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 53.7%
  • HTML 46.0%
  • Python 0.3%