Analysis of Heuristic Measures for Cluster Split in Bisecting K-Means

Y. Sri Lalitha; A. Govardhan

Analysis of Heuristic Measures for Cluster Split in Bisecting K-Means

Affiliations
1 Department of CSE, Gokaraju Rangaraju Institute of Engineering and Technology, India
2 Department of CSE, Jawaharlal Nehru University and Technology, Hyderabad, India

With ever increasing number of documents on web and other repositories, the task of organizing and categorizing these documents to the diverse need of the user by manual means is a complicated job, hence a machine learning technique named clustering is very useful. This paper proposes work is based on shared neighbors. Two documents are said to be neighbors of each other when their similarity is greater than a threshold. Here we choose to work with bisecting k-means in which cluster quality depends on choosing a cluster to be split till k clusters are formed. The automatic selection of cluster to be split is difficult and time consuming in text documents due to its high dimensionality. This paper implements Bisecting k-means a text document clustering technique to analyze the best criteria needed to select a cluster to be split. We have compared our results with the ones proposed in literature and our observed that our experimental results showed promising results when tested on real life data sets.