Predicting House Prices Using Linear Regression – Step-by-Step ML Project

Predicting House Prices Using Linear Regression – Step-by-Step ML Project

Predicting House Prices Using Linear Regression – Step-by-Step ML Project

This project demonstrates how to use Linear Regression to predict house prices using a structured dataset. It's a perfect beginner ML project for regression problems.

Tools Required: Python, pandas, scikit-learn, matplotlib, seaborn

Step 1: Import Required Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

Step 2: Load the Dataset

data = fetch_california_housing()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['Price'] = data.target
df.head()

Step 3: Explore the Dataset

df.info()
df.describe()
df.isnull().sum()

Visualize correlation:

plt.figure(figsize=(10,6))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
plt.title("Feature Correlation Heatmap")
plt.show()

Step 4: Split the Data

Split into training and testing sets:

X = df.drop('Price', axis=1)
y = df['Price']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Step 5: Train the Linear Regression Model

model = LinearRegression()
model.fit(X_train, y_train)

Step 6: Make Predictions

y_pred = model.predict(X_test)

Step 7: Evaluate the Model

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)
print("R2 Score:", r2)

Plot predictions vs actual values:

plt.scatter(y_test, y_pred)
plt.xlabel("Actual Prices")
plt.ylabel("Predicted Prices")
plt.title("Actual vs Predicted House Prices")
plt.show()

Step 8: Conclusion

You just built a simple yet powerful regression model to predict housing prices. Try improving the model by scaling the data or using more advanced regressors like RandomForestRegressor.

What's Next?

  • Try using Ridge or Lasso regression.
  • Experiment with feature selection and polynomial regression.
  • Visualize residuals to better understand the error.
  • Deploy your model using Streamlit!

Comments

Popular posts from this blog

Career Guide - B.Tech Students

How to Get a Job in Top IT MNCs (TCS, Infosys, Wipro, Google, etc.) – Step-by-Step Guide for B.Tech Final Year Students

Common HR Interview Questions