DEVELOP A CUSTOMER SEGMENTATION TO DEFINE MARKET STRATEGY USING DATA SCIENCE FOR BUSINESSES.

NEELESH SINHA
NEELESH SINHA

Jr. Data Scientist

Libraries used – 

  • NumPy 
  • pandas 
  • matplotlib 
  • Stats-models 
  • Sklearn 

Notebook – google colab 

Link – Customer segmentation notebook 

ADVANCED DATA PREPARATION 

  • Monthly average purchase and cash advance amount 
  • Purchases by type (one-off, installments) 
  • Average amount per purchase and cash advance transaction, 
  • Limit usage (balance to credit limit ratio), 
  • Payments to minimum payments ratio. 

DATA CLEANING 

First, I have inspected the data and find out null values and dropped customer id because we don’t need this feature for training the model then I have added the missing values by using mean values in minimum purchase and credit columns. 

FEATURE SCALING 

I have taken two metrics balance and purchase frequency and normalized them and used min max scaler for iteration and after scaling the features we will use K-means clustering algorithm which takes predetermined number of clusters and assigns the data points to its centroid based on Euclidian distance which is the length of the line between two points and the trained clusters for new unseen data points can be identified based on Euclidian distance. 

Clusters for k means 

The elbow method computes the sum of squared distances for cluster k and as you use more clusters the variance will reduce so as we can see around 8 clusters there is minimal effect in model after that. 

Training the model 

Divide the data into train and test set and convert them into numpy arrays.Use k-means with 8 clusters as derived from elbow method and predict the clusters on the test set and assign predictions on the new test data. 

CONCLUSION 

We can see that cluster 6 has higher number of credit limit, total number of purchases, frequent one-off purchases and can segment these customers to market. 

Leave a Comment

Your email address will not be published. Required fields are marked *