Customer Segmentation Using K-Means Clustering – A Medium-Level Data Science Project

Customer Segmentation Using K-Means Clustering – A Medium-Level Data Science Project

Customer Segmentation Using K-Means Clustering – A Medium-Level Data Science Project

This project demonstrates how to apply unsupervised learning (K-Means Clustering) for customer segmentation based on their annual income and spending scores. This approach helps businesses target specific groups for marketing strategies.

Tools Required: Python, pandas, matplotlib, seaborn, scikit-learn

Step 1: Import Libraries

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

Step 2: Load the Dataset

Use the popular Mall Customer Segmentation Data from Kaggle:

df = pd.read_csv("Mall_Customers.csv")
df.head()

Step 3: Explore and Clean the Data

df.info()
df.describe()

We'll use only relevant numeric columns for clustering:

X = df[['Annual Income (k$)', 'Spending Score (1-100)']]

Step 4: Visualize the Customer Distribution

sns.scatterplot(x='Annual Income (k$)', y='Spending Score (1-100)', data=df)
plt.title("Customer Distribution")
plt.show()

Step 5: Use the Elbow Method to Find Optimal Clusters

wcss = []
for i in range(1, 11):
    kmeans = KMeans(n_clusters=i, random_state=42)
    kmeans.fit(X)
    wcss.append(kmeans.inertia_)

plt.plot(range(1, 11), wcss, marker='o')
plt.title('Elbow Method')
plt.xlabel('Number of Clusters')
plt.ylabel('WCSS')
plt.show()

The "elbow" point typically shows the optimal number of clusters.

Step 6: Apply K-Means Clustering

kmeans = KMeans(n_clusters=5, random_state=42)
df['Cluster'] = kmeans.fit_predict(X)

Step 7: Visualize the Clusters

plt.figure(figsize=(8,6))
sns.scatterplot(x='Annual Income (k$)', y='Spending Score (1-100)', hue='Cluster', data=df, palette='Set2')
plt.title("Customer Segments")
plt.show()

Step 8: Interpret the Clusters

  • 💸 Cluster 0: High income, low spending – potential for premium promotions
  • 🛍️ Cluster 1: High income, high spending – VIP customers
  • 💡 Cluster 2: Average income/spending – target with balanced offers
  • 🧍 Cluster 3: Low income, low spending – low priority group
  • 🔥 Cluster 4: Low income, high spending – highly engaged customers

Step 9: Conclusion

K-Means clustering allowed us to group customers based on similar behavior. This segmentation helps in crafting targeted marketing strategies and improving customer satisfaction.

What's Next?

  • Use more features like age and gender for better segmentation.
  • Try other clustering techniques like DBSCAN or Hierarchical Clustering.
  • Deploy this as a dashboard using Streamlit or Dash.

Comments

Popular posts from this blog

Career Guide - B.Tech Students

How to Get a Job in Top IT MNCs (TCS, Infosys, Wipro, Google, etc.) – Step-by-Step Guide for B.Tech Final Year Students

Common HR Interview Questions