Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

An Improved Bisecting K-Means Algorithm for Text Document Clustering


Affiliations
1 Bharathiar University, Coimbatore, Tamil Nadu, India
     

   Subscribe/Renew Journal


Cluster analysis is an unsupervised learning approach that aims to group the objects into different groups or clusters. So that each cluster can contain similar objects with respect to any predefined condition. Text document clustering is the important technique of text mining in efficiently organizing the large volume of documents into a small number of significant clusters. The main objective of this research work is to cluster the collection of documents into related groups based on the contents of the particular documents. In order to perform this clustering task, this research work makes use of two existing algorithms, namely K-means and Bisecting K-means algorithm, and also this research work proposes a new clustering algorithm namely Enhanced-Bisecting K-means algorithm. From the experimental results it is observed that the proposed algorithm gives the better clustering accuracy than other algorithms.

Keywords

Text Mining, Text Document Clustering, K-Means, Bisecting K-Means, Enhanced Bisecting K-Means.
Subscription Login to verify subscription
User
Notifications
Font Size


  • Steinbach, M., Karypis, G., & Kumar, V. (2000). A Comparison of Document Clustering Techniques.
  • Proceedings of Knowledge Discovery and Data Mining (KDD) Workshop Text Mining.
  • Baghel, R., & Dhir, R. (2010). A frequent concepts based document clustering algorithm. International Journal of Computer Applications, July, 4(5), 6-12.
  • Li, Y., Lv, X., Liu, Y., & Shi, S. (2010). Research on text clustering based on concept weight. 4th International Conference on Genetic and Evolutionary Computing.
  • Napoleon, D., & Pavalakodi, S. (2011). A new method for dimensionality reduction using k-means clustering algorithm for high dimensional data set. International Journal of Computer Applications, January, 13(7), 41-46.
  • Liu, M., He, Y., & Hu, H. (2004). Web fuzzy clustering and its applications in web usage mining. Proceedings of 8th International Symposium on Future Software Technology.
  • Katariya, N. P., & Chaudhari, M. S. (2015). Bisecting kmeans algorithm for text clustering. International Journal of Advanced Research in Computer Science and Software Engineering, Februrary, 5(2), 221-223.
  • Uncu, O., Gruver, W. A., Kotak, D. B., Sabaz, D., Alibhai, Z., & Ng, C. (2006). GRIDBSCAN: Grid densitybased spatial clustering of applications with noise.
  • IEEE International Conference on Systems, Man, and Cybernetics, October 8-11, Taipei, Taiwan.
  • Han, J., & Kambr, M. (2001). Data Mining: Concepts and Techniques. Hand Book. Beijing: Higher Education Press.
  • Thangamani, M., & Thangaraj, P. (2010). Ontology based fuzzy document clustering scheme. Modern Applied Science, July, 4(7), 148-156.
  • Jayabharathy, J., Kanmani, S., & Parveen, A. (2011). Document Clustering and Topic Discovery based on Semantic Similarity in Scientific Literature.
  • Beil, F., Ester, M., Xu, X. (2002). Frequent term-based text clustering. ACM 1-58113-567-X/02/0007.
  • Deng, J., Hu, J. L., Chi, H., & Wu, J. (2010). An improved fuzzy clustering method for text mining.
  • nd International Conference on Networks Security, Wireless Communications and Trusted Computing.
  • Hamzah, A., Susanto, A., Soesianto, F., & Istyanto, J. E.(2007). Concept based text document clustering.
  • Proceedings of International Conference on Electrical Engineering and Informatics, Indonesia June.
  • Ji, J., Chan, T. Y. T., & Zhao, Q. (2009). Fast document clustering based on weighted comparative advantage Proceedings of IEEE International Conference on Systems, Man, and Cybernetics San Antonio, TX, USA - October.

Abstract Views: 308

PDF Views: 1




  • An Improved Bisecting K-Means Algorithm for Text Document Clustering

Abstract Views: 308  |  PDF Views: 1

Authors

Janani Balakumar
Bharathiar University, Coimbatore, Tamil Nadu, India
S. Vijayarani
Bharathiar University, Coimbatore, Tamil Nadu, India

Abstract


Cluster analysis is an unsupervised learning approach that aims to group the objects into different groups or clusters. So that each cluster can contain similar objects with respect to any predefined condition. Text document clustering is the important technique of text mining in efficiently organizing the large volume of documents into a small number of significant clusters. The main objective of this research work is to cluster the collection of documents into related groups based on the contents of the particular documents. In order to perform this clustering task, this research work makes use of two existing algorithms, namely K-means and Bisecting K-means algorithm, and also this research work proposes a new clustering algorithm namely Enhanced-Bisecting K-means algorithm. From the experimental results it is observed that the proposed algorithm gives the better clustering accuracy than other algorithms.

Keywords


Text Mining, Text Document Clustering, K-Means, Bisecting K-Means, Enhanced Bisecting K-Means.

References