Data Analysis and Regression Predictions with Python

Code

Project

The goal of this project is to predict the sales price of residential homes in Ames, Iowa, USA based on various house characteristics. It is a supervised regression problem.

The project features exploratory data analysis, data cleaning (handling outliers and missing values), feature engineering, target encoding, feature selection and model validation, hyperparameter tuning and regularization. The best regression model is Gradient Boosting, with a Root Mean Square Error of 0.12 on the train set.

A Streamlit app is provided for performing exploratory data analysis.