Exploratory Data Analysis and Classification Predictions with Python
Code
Project
This project predicts the survival during the titanic disaster based on socio-economic passengers data. It is a supervised classification problem.
The projects features data cleaning, feature engineering, one-hot encoding, feature selection and classifier fitting. The best classifier is Random Forest, with a train accuracy of 0.98 and an F1-score of 0.98. The kaggle submission scores 0.77 on the test set.