Paper title

“Selecting and Evaluating Hierarchical Cluster Representative”

Authors: Yaser A. M. Hasan, Muhammad A. Hassan, and M. J. Ridley
Affiliation
: University of Bradford, West Yorkshire, BD7 1DP, UK

Zarqa Private University, P. O. Box 2000, Zarqa 13110, Jordan

Abstract – Cluster retrieval was proposed to improve retrieval efficiency, since user needs are compared with a cluster representative; or centroid, instead of all documents. It is important to select the centroid in a way that strongly represents the semantics of the cluster members. In this paper we proposed a method to form the centroid in case of hierarchical clustering is used, it depends on index terms of the parent documents in the hierarchy, combining these terms into a virtual document vector of entries composed of the accumulated weight of each index term. The centroids were evaluated by using two variables; distance to other centroids, and connectivity with the same cluster member, it proved efficiency even when using a subset of the top most important terms to represent the centroid; called top-n% of term.