Real Estate Price Prediction Kaggle Competition
-
Real Estate price prediction competition comprises of predict prices of houses based on a combination of various parameters using Regression techniques.
-
The Dataset comprised of Training and Testing data. Testing data does not include Sale price of houses, but, it includes all variables in the Training set.
-
Training Data set comprises of numerical, categorical and ordinal data.
-
Numerical Data includes variables such as (Only key variables are listed her): a. Overall plot area b. Living area c. Basement area d. Garage cars, etc.
-
Categorical variables include: a. Location b. Pool QC c. House Style d. Sale type,etc.
-
Ordinal variables include: a. House Overall quality b. House External condition,etc.
-
Model uploaded here is a WIP version comprising of 14 sections. Key features of model are descriptive and visual analysis, feature engineering and XGB Regressor.Each individual section comprises of changes on training data followed by replication on testing data wherever applicable.
-
Section 1 : Comprises of importing packages for the model
-
Section 2: Import data and basic checks on imported data
-
Section 3: Delete columns based on low count of available data as per section 2
-
Section 4: Comprises of plotting sales price trend wrt categorical variables. This is useful for feature engineering
-
Section 5: Comprises of plotting sales price trend wrt numberical data.
-
Section 6: Outlier rows are deleted based on visualization of section 5 and analysis in section 5.
-
Section 7: Comprises of deleting columns based on NA cells and replacing NA with required value as per variable type.
-
Section 8: Categorical columns for Year values are converted to a numerical feature of age. Followed by this columns are deleted to save on memory.
-
Section 9: Dummy generation and scaling has been carried out in this section
-
Section 10: Splitting of training data set into test and train for modeling purpose alongwith shuffling of data is carried out in this section
-
Section 11:Data preparation for XGB Regressor model is carried out here.
-
Section 12: XGB Regressor model has been implemented in this section
-
Section 13: Model accuracy analysis and prediction on Testing data for submission
-
Section 14: Conversion to CSV file for submission is carried out in this section
-
Model Remarks: a. Model has been built to prepare an overall framework for XGB Regressor based prediction b. Model has further scope for improvement in feature analysis, paramter and hyper-parameter tuning. c. Moreover, current version does not support feature name extraction for further analysis and refinement. d. Also, use of K-Fold method will also be helpful in improving training of model