Libraries used –
Notebook – google colab
ADVANCED DATA PREPARATION
- Monthly average purchase and cash advance amount
- Purchases by type (one-off, installments)
- Average amount per purchase and cash advance transaction,
- Limit usage (balance to credit limit ratio),
- Payments to minimum payments ratio.
First, I have inspected the data and find out null values and dropped customer id because we don’t need this feature for training the model then I have added the missing values by using mean values in minimum purchase and credit columns.
I have taken two metrics balance and purchase frequency and normalized them and used min max scaler for iteration and after scaling the features we will use K-means clustering algorithm which takes predetermined number of clusters and assigns the data points to its centroid based on Euclidian distance which is the length of the line between two points and the trained clusters for new unseen data points can be identified based on Euclidian distance.
Clusters for k means
The elbow method computes the sum of squared distances for cluster k and as you use more clusters the variance will reduce so as we can see around 8 clusters there is minimal effect in model after that.
Training the model
Divide the data into train and test set and convert them into numpy arrays.Use k-means with 8 clusters as derived from elbow method and predict the clusters on the test set and assign predictions on the new test data.
We can see that cluster 6 has higher number of credit limit, total number of purchases, frequent one-off purchases and can segment these customers to market.