Paper title “Selecting and Evaluating Hierarchical Cluster Representative”
Authors:
Yaser A. M. Hasan, Muhammad A. Hassan, and M. J. Ridley Abstract – Cluster retrieval was proposed to improve retrieval efficiency, since user needs are compared with a cluster representative; or centroid, instead of all documents. It is important to select the centroid in a way that strongly represents the semantics of the cluster members. In this paper we proposed a method to form the centroid in case of hierarchical clustering is used, it depends on index terms of the parent documents in the hierarchy, combining these terms into a virtual document vector of entries composed of the accumulated weight of each index term. The centroids were evaluated by using two variables; distance to other centroids, and connectivity with the same cluster member, it proved efficiency even when using a subset of the top most important terms to represent the centroid; called top-n% of term. |