Cross-Validation and ROC Curves on the Titanic Dataset

Introduction

Day 3 was about going beyond a single train/test split.

I added cross-validation and looked at ROC curves to better evaluate my model.

Why It Matters

One train/test split can give you a lucky (or unlucky) result.

Cross-validation makes evaluation more robust. ROC curves show how your model performs at all thresholds, not just the default 0.5.

Approach

  • Dataset: Titanic (expanded features)
  • Features: sex, age, fare, class, embarked, sibsp, parch, alone
  • Model: Logistic Regression
  • Validation: Stratified 5-fold cross-validation
  • Evaluation: Accuracy, F1, ROC-AUC
  • Visualization: ROC curve

Results

Cross-validation gave a more stable estimate of performance. The ROC curve showed the model does a decent job separating survivors from non-survivors, even if it’s not perfect.

Takeaways

  • Always validate with multiple folds, it’s more reliable.
  • ROC-AUC is a better measure than just accuracy for classification.
  • Adding more features can improve a model, but only if they add real signal.

Artifacts

Video walkthrough

Comments

Leave a comment