Introduction
Day 3 was about going beyond a single train/test split.
I added cross-validation and looked at ROC curves to better evaluate my model.
Why It Matters
One train/test split can give you a lucky (or unlucky) result.
Cross-validation makes evaluation more robust. ROC curves show how your model performs at all thresholds, not just the default 0.5.
Approach
- Dataset: Titanic (expanded features)
- Features: sex, age, fare, class, embarked, sibsp, parch, alone
- Model: Logistic Regression
- Validation: Stratified 5-fold cross-validation
- Evaluation: Accuracy, F1, ROC-AUC
- Visualization: ROC curve
Results
Cross-validation gave a more stable estimate of performance. The ROC curve showed the model does a decent job separating survivors from non-survivors, even if it’s not perfect.
Takeaways
- Always validate with multiple folds, it’s more reliable.
- ROC-AUC is a better measure than just accuracy for classification.
- Adding more features can improve a model, but only if they add real signal.
Artifacts
Video walkthrough

Leave a comment