Predicting House Prices Using Linear Regression – Step-by-Step ML Project
Predicting House Prices Using Linear Regression – Step-by-Step ML Project
This project demonstrates how to use Linear Regression to predict house prices using a structured dataset. It's a perfect beginner ML project for regression problems.
Tools Required: Python, pandas, scikit-learn, matplotlib, seaborn
Step 1: Import Required Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
Step 2: Load the Dataset
data = fetch_california_housing()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['Price'] = data.target
df.head()
Step 3: Explore the Dataset
df.info()
df.describe()
df.isnull().sum()
Visualize correlation:
plt.figure(figsize=(10,6))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
plt.title("Feature Correlation Heatmap")
plt.show()
Step 4: Split the Data
Split into training and testing sets:
X = df.drop('Price', axis=1)
y = df['Price']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Step 5: Train the Linear Regression Model
model = LinearRegression()
model.fit(X_train, y_train)
Step 6: Make Predictions
y_pred = model.predict(X_test)
Step 7: Evaluate the Model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print("Mean Squared Error:", mse)
print("R2 Score:", r2)
Plot predictions vs actual values:
plt.scatter(y_test, y_pred)
plt.xlabel("Actual Prices")
plt.ylabel("Predicted Prices")
plt.title("Actual vs Predicted House Prices")
plt.show()
Step 8: Conclusion
You just built a simple yet powerful regression model to predict housing prices. Try improving the model by scaling the data or using more advanced regressors like RandomForestRegressor.
What's Next?
- Try using Ridge or Lasso regression.
- Experiment with feature selection and polynomial regression.
- Visualize residuals to better understand the error.
- Deploy your model using Streamlit!
Comments
Post a Comment