Predicting Housing Prices with Linear Regression in Python

Introduction

This was the very first step in my ML journey.

I started simple: predicting California housing prices with Linear Regression.

The goal wasn’t to get state-of-the-art results, but to get comfortable with the workflow: loading data, cleaning it, training a model, and evaluating it properly.

Why It Matters

Regression is one of the building blocks of machine learning.

Almost everything, from sales forecasts to predicting energy usage, starts with this foundation.

Approach

  • Dataset: California housing prices
  • Features: median income, house age, rooms, population, etc.
  • Model: Linear Regression (baseline) and Ridge Regression (regularized version)
  • Evaluation: Mean Squared Error (MSE), R²

Results

Both models gave decent predictions, but Ridge handled multicollinearity a bit better. The main win here was learning the full pipeline end-to-end.

Takeaways

  • Always start with a baseline, even a simple model can give insights.
  • Regularization (like Ridge) helps stabilize models when features overlap.
  • Visualization of residuals is just as important as raw metrics.

Artifacts

Video walkthrough

Comments

Leave a comment