Exploratory Data Analysis and Classification Predictions with Python

Code

Project

This project predicts the survival during the titanic disaster based on socio-economic passengers data. It is a supervised classification problem.

The projects features data cleaning, feature engineering, one-hot encoding, feature selection and classifier fitting. The best classifier is Random Forest, with a train accuracy of 0.98 and an F1-score of 0.98. The kaggle submission scores 0.77 on the test set.