Customer Segmentation Using K-Means Clustering – A Medium-Level Data Science Project
Customer Segmentation Using K-Means Clustering – A Medium-Level Data Science Project
This project demonstrates how to apply unsupervised learning (K-Means Clustering) for customer segmentation based on their annual income and spending scores. This approach helps businesses target specific groups for marketing strategies.
Tools Required: Python, pandas, matplotlib, seaborn, scikit-learn
Step 1: Import Libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
Step 2: Load the Dataset
Use the popular Mall Customer Segmentation Data from Kaggle:
df = pd.read_csv("Mall_Customers.csv")
df.head()
Step 3: Explore and Clean the Data
df.info()
df.describe()
We'll use only relevant numeric columns for clustering:
X = df[['Annual Income (k$)', 'Spending Score (1-100)']]
Step 4: Visualize the Customer Distribution
sns.scatterplot(x='Annual Income (k$)', y='Spending Score (1-100)', data=df)
plt.title("Customer Distribution")
plt.show()
Step 5: Use the Elbow Method to Find Optimal Clusters
wcss = []
for i in range(1, 11):
kmeans = KMeans(n_clusters=i, random_state=42)
kmeans.fit(X)
wcss.append(kmeans.inertia_)
plt.plot(range(1, 11), wcss, marker='o')
plt.title('Elbow Method')
plt.xlabel('Number of Clusters')
plt.ylabel('WCSS')
plt.show()
The "elbow" point typically shows the optimal number of clusters.
Step 6: Apply K-Means Clustering
kmeans = KMeans(n_clusters=5, random_state=42)
df['Cluster'] = kmeans.fit_predict(X)
Step 7: Visualize the Clusters
plt.figure(figsize=(8,6))
sns.scatterplot(x='Annual Income (k$)', y='Spending Score (1-100)', hue='Cluster', data=df, palette='Set2')
plt.title("Customer Segments")
plt.show()
Step 8: Interpret the Clusters
- 💸 Cluster 0: High income, low spending – potential for premium promotions
- 🛍️ Cluster 1: High income, high spending – VIP customers
- 💡 Cluster 2: Average income/spending – target with balanced offers
- 🧍 Cluster 3: Low income, low spending – low priority group
- 🔥 Cluster 4: Low income, high spending – highly engaged customers
Step 9: Conclusion
K-Means clustering allowed us to group customers based on similar behavior. This segmentation helps in crafting targeted marketing strategies and improving customer satisfaction.
What's Next?
- Use more features like age and gender for better segmentation.
- Try other clustering techniques like DBSCAN or Hierarchical Clustering.
- Deploy this as a dashboard using Streamlit or Dash.
Comments
Post a Comment