Skip to content

Binary classification of products passage or failure of quality control

Notifications You must be signed in to change notification settings

RyanBusby/productionFailures

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Binary Classification with Apache Spark / HDFS

↖data source

The goal of the competition is to predict which parts will fail quality control

My goal is to utilize the hadoop ecosystem to handle a large dataset and establish a pipeline for machine learning

  • Aggregate columns using RDD transformations
  • Create a column that indicates which of those column aggregations are outliers.
  • Model data with Spark Machine Learning package
  • Predict on test data
  • Run this as is to use the toy data set example

About

Binary classification of products passage or failure of quality control

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published