Customer Segmentation Using K-Means Clustering – A Medium-Level Data Science Project

- July 05, 2025

This project demonstrates how to apply unsupervised learning (K-Means Clustering) for customer segmentation based on their annual income and spending scores. This approach helps businesses target specific groups for marketing strategies.

Tools Required: Python, pandas, matplotlib, seaborn, scikit-learn

Step 1: Import Libraries

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

Step 2: Load the Dataset

Use the popular Mall Customer Segmentation Data from Kaggle:

df = pd.read_csv("Mall_Customers.csv")
df.head()

Step 3: Explore and Clean the Data

df.info()
df.describe()

We'll use only relevant numeric columns for clustering:

X = df[['Annual Income (k$)', 'Spending Score (1-100)']]

Step 4: Visualize the Customer Distribution

sns.scatterplot(x='Annual Income (k$)', y='Spending Score (1-100)', data=df)
plt.title("Customer Distribution")
plt.show()

Step 5: Use the Elbow Method to Find Optimal Clusters

wcss = []
for i in range(1, 11):
    kmeans = KMeans(n_clusters=i, random_state=42)
    kmeans.fit(X)
    wcss.append(kmeans.inertia_)

plt.plot(range(1, 11), wcss, marker='o')
plt.title('Elbow Method')
plt.xlabel('Number of Clusters')
plt.ylabel('WCSS')
plt.show()

The "elbow" point typically shows the optimal number of clusters.

Step 6: Apply K-Means Clustering

kmeans = KMeans(n_clusters=5, random_state=42)
df['Cluster'] = kmeans.fit_predict(X)

Step 7: Visualize the Clusters

plt.figure(figsize=(8,6))
sns.scatterplot(x='Annual Income (k$)', y='Spending Score (1-100)', hue='Cluster', data=df, palette='Set2')
plt.title("Customer Segments")
plt.show()

Step 8: Interpret the Clusters

💸 Cluster 0: High income, low spending – potential for premium promotions
🛍️ Cluster 1: High income, high spending – VIP customers
💡 Cluster 2: Average income/spending – target with balanced offers
🧍 Cluster 3: Low income, low spending – low priority group
🔥 Cluster 4: Low income, high spending – highly engaged customers

Step 9: Conclusion

K-Means clustering allowed us to group customers based on similar behavior. This segmentation helps in crafting targeted marketing strategies and improving customer satisfaction.

What's Next?

Use more features like age and gender for better segmentation.
Try other clustering techniques like DBSCAN or Hierarchical Clustering.
Deploy this as a dashboard using Streamlit or Dash.

Search This Blog

My Teaching Desk - By LokeshDev