Jan 11, 2010

Cluster Analysis

Cluster analysis is an exploratory data analysis tool for solving classification problems.  Its object is to sort cases (people, things, events, etc) into groups, or clusters, so that the degree of association is strong between members of the same cluster and weak between members of different clusters.  Each cluster thus describes, in terms of the data collected, the class to which its members belong; and this description may be abstracted through use from the particular to the general class or type.

When do we Do cluster Analysis

Cluster analysis simply discovers structures in data without explaining why they exist. We deal with clustering in almost every aspect of daily life. For example, a group of diners sharing the same table in a restaurant may be regarded as a cluster of people. In food stores items of similar nature, such as different types of meat or vegetables are displayed in the same or nearby locations.

Cluster analysis was done for a Process which had three metrics which had no correlation with each other. The Client was of the belief that the advisor population based on the three metrics remains the same week on week and identification of outliers can easily be done. The Objective of the exercise was to identify a common rank for the entire agent population of the process based on cumulative performance of the three metrics.


The three metrics that were being considered for the process improvement were

  • Average Handle Time ( A) – Lower the better
  • Resolution scores ( R ) – Higher the better
  • Case Logging (C) – Higher the better.

The Need for Doing a Cluster Analysis arose from the following issue.

  • It was identified that None of the above three metrics bore any Correlation with each other.
  • The client was of perception that the agent populations with High resolution scores have high Case logging and have low AHT.
  • The client wanted to have an outlier management program in place based on weekly Quartile wise outliers as they perceived that Quartile 4 agents were constant week on week and hence it was the same set of agents who were responsible for lowering the performance of the site.

The Analysis was done with the Hypothesis: The agent population keeps on changing every week

Phase – I- Data Collection

Resolution score (R) as a metric was a 4 week Lag data point. Case logging scores (C) was sent across by client from the month of Jan’09 onwards. Hence Baseline data is considered for WE 2nd Jan’09 (C, R, A Details). Since Resolution data is lagged by 4 weeks, Resolution data has been considered for WE 30th Jan’09.

Based on the baseline week the movement of the agents would be tracked for the next 8 weeks.

Data set has been considered for Resolution, Average Handle Time and Case Logging for 233 agents who have 8 weeks of data from WE 2nd Jan’09 – WE 20th Feb’09.

Data points considered are:

Resolution scores – WE 30th Jan’09

Case Logging – WE 2nd Jan’09

Average handle Time – WE 2nd Jan’09

Phase – II- Creation of Grid.

Grid for different buckets of Case Logging, Resolution scores & Average Handle Time has been considered.

Agent population across all 7 weeks ( WE 2nd Jan’09 – WE 20th Feb ’09 ) have been ranked in each of the parameters as per the Gridline.

Rank 1- 6 are formed based on the three metrics groups and With Base line rank kept as constant the movement of the same set of agents are tracked week on week.

Movement of overall rank and as well as movement of each metrics group wise is tracked.

The Grid that has been considered for grouping Case Logging , Resolution Scores & Average Handle Time are :


Based on the 5 groups formed rank from 1 – 6 are assigned.

Rank: 1 – Outliers

Rank: 2 – Extremely Poor performance

Rank: 3 – Poor Performance

Rank: 4 – Average Performance

Rank: 5 – Good Performance

Rank: 6 – Excellent performance

Problems/Challenges Faced

The Challenges were:

  • Handling huge data base for 253 agents who had rolling 8 weeks data on all three metrics.
  • Creating a grid for the entire population based on three metrics week on week.
  • Creation of a Rank for Baseline week and tracking the same rank of agents for the next set of 7 weeks.

Movement of Ranks tracked for each week (Overall and individual).

Details of the proposed solution

Creation of a Grid based and Ranking based on individual metric:

The data Points on all three metrics for 253 agents were for the baseline week (WE 2nd Jan’09).With the help of Minitab the ranks for the individual metrics were obtained. The steps that were followed in Minitab are

Data > Code > Numeric to Text


The Snapshot above shows the Rank Creation for C.The same was used for Rank Creation for R and A.

The Snapshot of agent wise / Metric wise rank created is


To obtain the overall Rank on based of three metrics the Rank of three metrics were concatenated. Snapshot attached for the step and the Result obtained.


The Column Total specifies the Concatenated value. The Entire Column C10-T was copied into an excel to find out the number of possible combinations of the racks post concatenating.

Creation of an overall rank.

A total of 53 combinations were found from the “Total” field. Based on Individual Combinations a Rank was made post discussion with the team.

For Example :

  • Agent in rank 1 in Case logging is within a score of 12.28% to 28.89%, Rank 1 in Resolution 51.28% - 61.03% , Rank 1 in Average Handle time ( 412 sec – 706 sec ).Hence the agent is good in Average Handle Time but outlier in Resolution and Case Logging.Hence the overall performance would be in rank 1 ( Outlier).
  • Agent in Rank 4 in Case Logging( 62.41% - 79.12% ) , Rank 1 in Average Handle time ( 412 sec – 706 sec ) , rank 5 in Resolution ( 90.26% - 100.00%).This Agent is an excellent performer hence overall rank would be rank 6.

Based on these data points for all 53 combinations an Overall rank ( 1 – 6) were defined. A snapshot of the Overall Rank provided is shown below.


A Matrix plot was then made to see the movement of the agents in each cluster. The Observations are below. Post having the Matrix plot in place for the baseline week the Movement of these ranks was mapped for the next 7 weeks.


Movement of Rank 1 and Rank 2 agents in each of the metric clip_image001[5]


The Observations were for Rank 1 & Rank 2 agents.

  • Average Handling Time for Rank 1 and Rank 2 agents have moved to group 2 on an average. For A lower the group the better the performance is.
  • Rank 1 and Rank 2 agents have moved to group 2 & 3 for Case Logging
  • Resolution scores for rank 1 and rank 2 agents have moved to group 4.

Observations : - Overall rank – 1

Resolution scores : Baseline Resolution was at 73.03% with variance from 58% - 85%.Over 7 weeks the mean has improved and so is the variation between the same set of agents.

Case Logging : Baseline was at 44% with Huge variation between agent sets. The variation has reduced and mean has improved by 10.00%.

Average Handle Time : Baseline mean was at 1491 which has shown a reducing trend for all 7 weeks and has come down to 1023.31.

Scope : Improvement of variation of this population set in R , C would help in improvement of the overall scores.

Observations Overall: rank 2

Resolution Scores : Resolution scores have improved from 73% to 85% over weeks. The population has been trending at 81% RFT for three weeks.

Case Logging: Baseline agents had a few outliers among them though the variation within the agents was less. Though there has been an improvement in Case Logging for the entire population by 13% there is still scope of improvement for this population.

Average Handle Time : Average handle Time of Rank 2 agents over the week are performing at an optimum level.Average Handle Time reduction has been steady over the weeks.

Scope : Improvement opportunity in Case Logging.

The same exercise was done for Rank 3 , rank 4 , rank 5 & 6 agents.

Rank 3 and Rank 4 agents :

  • Rank 3 agents have remained in group 3 over weeks in Case Logging .Rank 4 agents in Case Logging have come down to group 3.
  • Rank 3 agents in Resolution scores have moved in group 4 while rank 4 agents have stayed constant in group 4.
  • In Average Handle Time rank 3 and rank 4 agents have stayed constant in group 3.

Observations :rank 3

For agents in rank 3 there has been no substantial improvement in Resolution scores and Case Logging.

Resolution scores have improved has improved from 81%-83%, Case Logging has improved from 49% - 55% Average handle time improvement of 200 sec from the baseline data.

Scope : Improvement of this population in R and C.Scrubbing cases of this population to find out areas of opportunity.

Observations :rank 4

Average performers have their Resolution scores decreased over weeks. The Resolution score for 87% has reduced from 87% - 83% on an average.

Improvements observed in Average Handle Time and Case Logging.

Scope : Improvement of this population in Resolution Scores.Training and one on one sessions would help in improving the scores.

Rank 5 & Rank 6 agents

  • Case Logging for rank 5 agents have been in group 3 while rank 6 agents have moved to group 4.
  • Average Handle Time of rank 5 and 6 agents have been in group 3.
  • Resolution scores for rank 5 and rank 6 agents have moved to group 4 and 5.

Observations : rank 5

Rank 5 agents have maintained their scores in all three metrics.

Case Logging has constant through out the weeks.

Average Handle Time variation though high between the agent population the mean Average Handle Time of the population has been constant between 900 – 1000 secs.

Resolution scores – There has been huge fluctuation in the Resolution scores of this population. For We 9th and 16th Jan the population has been trending < 83% Resolution scores, but thereafter the Resolution scores has been consistently > 85%.

Scope : Reduction in Variation in Average Handle Time.

Maintaining the Resolution scores focus drive among this agent set.

Observations : rank 6

Case Logging score of this population has been >60% except for two weeks 30th Jan’09 and 6th Feb’09.

Average Handle Time of this population has been maintained between 800 – 900 sec.

Resolution scores for this population has been at lowest of 79% for WE 23rd Jan.

Scope : Reduction in Variation in Average Handle Time and Case logging.

Weekly Sharing of major call drivers for this set to ensure no major fluctuation of scores in Resolution scoresfor this group.

Movement of Ranks with Baseline Rank


Note :

The Graph represents the movement of the baseline ranks from WE 9th Jan’09 – WE 20th Feb’09.
Inference : Overall population actually moves in Rank – 1 status over the weeks except for the best performers or Rank 6 which has been hovering over Rank 4 & 5.


% Population in each category

Observations :

  • % Population of agents in rank 3 are on an increasing trend.
  • % Population of agents in Rank 4 – average performers have increased.
  • % Population of agents (Outlier – Rank 1, and Extremely poor performers – Rank 2) have reduced by 2% & 12% respectively.
  • % of Good performers have increased from 3.86% to 5.15% whole % of Excellent performers have remained constant


Cluster Analysis helps in identifying group of agents in the same cluster based on multiple metrics when the CPM’s usually do not have any Correlation with each other. The details of this exercise were shared with the client. The inference of the analysis was

  • Top and Bottom performers keep on changing every week.
  • The average performers are consistent with 69.26% populations (of the baseline week) who have remained constant for the last 8 weeks.
  • Resolution Scores, Case Logging are positively correlated for top performers and so is Resolution scores and Average Handle Time negatively correlated for the same population.
  • The Agents in Rank 3 & 4(Average performers) have been in the range of 81.00% Resolution Scores, 50.00% Case Logging and 1100 Sec Average Handle Time.
  • Training Needs Identified for these set of 98 advisors would help in increasing the performance of the overall set.
  • Movement of ranks for each of the individual metric shows the improvement in each rank except for rank 3 and 4 who are average performers.

Attaching a snapshot for the % of agents who have repeated each week in the same bucket as per the baseline week.

Rank 1: Bottom performers and Rank 6: Top Performers.


This helped in showcasing to the client that the Top and the Bottom performers are not the same week on week hence agent variation Management based on rank 3 would not yield a benefit of more than 1%.

This activity has been showcased to the client on numerous platforms. This has also helped in deciding on the correct structure for deciding on targets for agent variation Management at process level.


Cluster Analysis thus helps in identifying the focus area, movement of advisors, process when there are more than one CPM.It helps in showcasing the interaction of different CPM’s and the level in which majority can perform considering multiple matrices.



clé usb said...

Cluster analysis is a tool of discovery. It may reveal associations and structure in data which, though not previously evident, nevertheless are sensible and useful once found. The results of cluster analysis may contribute to the definition of a formal classification scheme, such as a taxonomy for related animals, insects or plants; or suggest statistical models with which to describe populations; or indicate rules for assigning new cases to classes for identification and diagnostic purposes; or provide measures of definition, size and change in what previously were only broad concepts or find exemplars to represent classes.

Text Widget

Copyright © Vinay's Blog | Powered by Blogger

Design by | Blogger Theme by