Supplementary Website for “Weighted Rank Aggregation of Cluster Validation Measures: A Monte Carlo Cross-Entropy Approach”

header image 2

 

 


 
 

ABSTRACT

Biologists often employ clustering techniques in the explorative phase of microarray data analysis to discover relevant biological groupings. Given the availability of numerous clustering algorithms in the machine learning literature, an user might want to select one that performs best to his/her data set or application. Various validation measures have been proposed over the years to judge the quality of clusters produced by a given clustering algorithm including their biological relevance. Unfortunately,  a given clustering algorithm can perform poorly under one validation measure while outperforming many other algorithms under another validation measure. A manual synthesis of results from multiple validation measures is nearly impossible in practice, especially, when a large number of clustering algorithms are to be compared using several measures. An automated and objective way of reconciling the rankings is needed.

Using a Monte Carlo cross-entropy algorithm, we successfully combine the ranks of a set of clustering algorithms under consideration via a weighted aggregation that optimizes a distance criterion. The proposed weighted rank aggregation allows for a far more objective and automated assessment of clustering results than a simple visual inspection. We illustrate our procedure using one simulated, as well as, three real gene expression data sets from various platforms where we rank a total of eleven clustering algorithms using a combined examination of ten different validation measures. The aggregate rankings were found for a given number of clusters k and also for an entire range of k. Generally speaking, UPGMA and SOTA, each with the right combination of dissimilarity measure, emerge as the overall top performers.

 

 
 

 

 

 

 

 

 

Last Updated: Feb 21, 2007