In this project we will apply unsupervised learning techniques on product spending data collected for customers of a wholesale distributor in Lisbon, Portugal to identify customer segments hidden in the data.
Things to learn by completing this project:
- How to apply preprocessing techniques such as feature scaling and outlier detection.
- How to interpret data points that have been scaled, transformed, or reduced from PCA.
- How to analyze PCA dimensions and construct a new feature space.
- How to optimally cluster a set of data to find hidden patterns in a dataset.
- How to assess information given by cluster data and use it in a meaningful way.
Part of Udacity's Machine Learning Nanodegree
Project resides in customer_segments.ipynb
This project runs on Python 3.5 and uses the following libraries:
The customer segments data is included as a selection of 440 data points collected on data found from clients of a wholesale distributor in Lisbon, Portugal. More information can be found on the UCI Machine Learning Repository.
Note (m.u.) is shorthand for monetary units.
Features
Fresh
: annual spending (m.u.) on fresh products (Continuous);Milk
: annual spending (m.u.) on milk products (Continuous);Grocery
: annual spending (m.u.) on grocery products (Continuous);Frozen
: annual spending (m.u.) on frozen products (Continuous);Detergents_Paper
: annual spending (m.u.) on detergents and paper products (Continuous);Delicatessen
: annual spending (m.u.) on and delicatessen products (Continuous);Channel
: {Hotel/Restaurant/Cafe - 1, Retail - 2} (Nominal)Region
: {Lisbon - 1, Oporto - 2, or Other - 3} (Nominal)