House_prices-prediction

Help Alzar, the record keeper for finding lost details of 3.5k houses with the help of Machine Learning.

Clone repository and run dataset_creation.py and then train_test_files_creation.py to create dataset.
- Dataset is given in form of text files so preprocessing is required to convert them into csv file
- Firstly dataset_creation.py extracts data from text files and make finaldataset.csv.
- Then train_test_files_creation.py uses pandas library to split finaldataset.csv into testing and training dataset on basis of house prices.

This problem statement uses xgboost Regressor so it must be installed through either of these ways.
- Using pip- pip install xgboost
- Using conda- conda install -c py-xgboost
Python2.7 is preferred for this project.

Run dataset_creation.py followed by train_test_files_creation.py to create and split dataset from raw text files to processed csv files.
Run features_analysis.py to visualize train and test dataset using functions of pandas dataframe. It helps in visualizing relations between features and target value with the help of histogram, scatter plots and Heat Map.

Run algo.py for trying new features and feature selection and filling NaN values through interpolation.
- After this data is ready to fit for different models.
- This gives detail time and r2_score analysis after tuning hyperparameters of different types of regressions.
- This will run cross validation across the training set on LinearRegression, LassoRegression, Ridge Regression which prints r2_score.

Finally run final_solution_xgboost.py to get the final results
With the help of xgboost regressor we are able to achieve r2_score of 0.99512.
solution.csv is also given in repository to match results of test dataset.
xgboost with tuned parameters gives final r2_score of 0.99553 on test dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitattributes		.gitattributes
Bob.txt		Bob.txt
Bright_Brothers.txt		Bright_Brothers.txt
Masters_of_Stones.txt		Masters_of_Stones.txt
Not_Known.txt		Not_Known.txt
Problem Statement.pdf		Problem Statement.pdf
README.md		README.md
The_Greens.txt		The_Greens.txt
The_Kings.txt		The_Kings.txt
The_Lannisters.txt		The_Lannisters.txt
The_Ollivers.txt		The_Ollivers.txt
The_Overlords.txt		The_Overlords.txt
The_Starks.txt		The_Starks.txt
Untitled.ipynb		Untitled.ipynb
Wood_Priests.txt		Wood_Priests.txt
algo.py		algo.py
dataset_creation.py		dataset_creation.py
features_analysis.py		features_analysis.py
final_dataset.csv		final_dataset.csv
final_solution_xgboost.py		final_solution_xgboost.py
house_prices.csv		house_prices.csv
missing.csv		missing.csv
solution.csv		solution.csv
test.csv		test.csv
train.csv		train.csv
train_test_files_creation.py		train_test_files_creation.py

Provide feedback