Data Analysis and Regression Predictions with Python
Code
Project
The goal of this project is to predict the sales price of residential homes in Ames, Iowa, USA based on various house characteristics. It is a supervised regression problem.
The project features exploratory data analysis, data cleaning (handling outliers and missing values), feature engineering, target encoding, feature selection and model validation, hyperparameter tuning and regularization. The best regression model is Gradient Boosting, with a Root Mean Square Error of 0.12 on the train set.
A Streamlit app is provided for performing exploratory data analysis.