Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Clustering Categorical Data Using K-Modes Based on Cuckoo Search Optimization Algorithm


Affiliations
1 Department of Computer Applications, Kongu Engineering College, India
2 Department of Computer Science, NKR Government Arts College for Women, India
3 Department of Computer Technology, Kongu Engineering College, India
     

   Subscribe/Renew Journal


Cluster analysis is the unsupervised learning technique that finds the interesting patterns in the data objects without knowing class labels. Most of the real world dataset consists of categorical data. For example, social media analysis may have the categorical data like the gender as male or female. The k-modes clustering algorithm is the most widely used to group the categorical data, because it is easy to implement and efficient to handle the large amount of data. However, due to its random selection of initial centroids, it provides the local optimum solution. There are number of optimization algorithms are developed to obtain global optimum solution. Cuckoo Search algorithm is the population based metaheuristic optimization algorithms to provide the global optimum solution. Methods: In this paper, k-modes clustering algorithm is combined with Cuckoo Search algorithm to obtain the global optimum solution. Results: Experiments are conducted with benchmark datasets and the results are compared with k-modes and Particle Swarm Optimization with k-modes to prove the efficiency of the proposed algorithm.

Keywords

Cluster Analysis, k-Modes, Cuckoo Search Optimization, Local Optima, Initial Centroids.
Subscription Login to verify subscription
User
Notifications
Font Size

  • Z. Huang, “A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining”, Proceedings of Data Mining and Knowledge Discovery, pp. 1-6, 1997.
  • Z. Huang, “Extensions to the K-means Algorithm for Clustering Large Data Sets with Categorical Value”, Data Mining and Knowledge Discovery, Vol. 2, No. 3, pp. 283- 304, 1998.
  • G. Gan, C. Ma and J. Wu, “Data Clustering: Theory, Algorithms, and Applications”, Society for Industrial and Applied Mathematics, 2007.
  • X.S. Yang and S. Deb, “Cuckoo Search via Levy Flights”, Proceedings of IEEE World Congress in Nature and Biologically Inspired Computing, pp. 210-214, 2009.
  • X.S. Yang and S. Deb, “Engineering Optimisation by Cuckoo Search”, International Journal of Mathematical Modelling and Numerical Optimisation, Vol. 1, No. 4, pp. 330-343, 2010
  • Z. Huang and M.K Ng, “A Fuzzy K-Modes Algorithm for Clustering Categorical Data”, IEEE Transactions on Fuzzy Systems, Vol. 7, No. 4, pp. 446-452, 1999.
  • M.K. Ng and J.C Wong, “Clustering Categorical Data Sets using Tabu Search Techniques”, Pattern Recognition, Vol. 35, No. 12, pp. 2783-2790, 2002.
  • F. Glover and M. Laguna, “Tabu Search”, Kluwer Academic Publishers, 1997.
  • G. Gan, Z. Yang and J. Wu, “A Genetic K-Modes Algorithm for Clustering Categorical Data”, Proceedings of International Conference on Advanced Data Mining and Applications, pp. 195-202, 2005
  • J.H. Holland, “Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence”, MIT press, 1992.
  • G. Gan, J. Wu and Z. Yang, “A Genetic Fuzzy K-Modes Algorithm for Clustering Categorical Data”, Expert Systems with Applications, Vol. 36, No. 2, pp. 1615-1620, 2009.
  • H. Izakian, A. Abraham and V. Snasel, “Clustering Categorical Data using a Swarm-based Method”, Proceedings of World Congress on In Nature and Biologically Inspired Computing, pp. 1720-1724, 2009.
  • L. Mei and Z. Xiang-Jun, “A Novel PSO k-Modes Algorithm for Clustering Categorical Data”, Proceedings of Computer, Informatics, Cybernetics and Applications, pp. 1395-1402, 2012
  • X. Zhao and M. Lu, “3D Object Retrieval Based on PSO-K-Modes Method”, Multimedia Tools and Applications, Vol. 8, No. 4, pp. 963-970, 2013.
  • J. Ji, W. Pang, Y. Zheng, Z. Wang and Z. Ma, “A Novel Artificial Bee Colony based Clustering Algorithm for Categorical Data”, PLOS One, Vol. 10, No. 5, pp. 1-6, 2015.
  • G.G. Wang, A.H. Gandomi, X. Zhao and H.C. Chu, “Hybridizing Harmony Search Algorithm with Cuckoo Search for Global Numerical Optimization”, Soft Computing, Vol. 20, No. 1, pp. 273-85, 2016
  • L. Yu, Z. Dong, H. Wang and Y. Ding, “The Cuckoo Search Algorithm based on Fuzzy C-Mean Clustering”, Proceedings of 36th Chinese Control Conference, pp. 2691-2696, 2017
  • K. Lakshmi, N. Karthikeyani Visalakshi and S. Shanthi. “Cuckoo Search based K-Prototype Clustering Algorithm”, Asian Journal of Research in Social Sciences and Humanities, Vol. 7, No. 2, pp. 300-309, 2017.
  • K. Lakshmi, N. Karthikeyani Visalakshi, S. Shanthi and S. Parvathavarthini, “Clustering Mixed Datasets using K-Prototype Algorithm based on Crow-Search Optimization”, Proceedings of Developments and Trends in Intelligent Technologies and Smart Systems, pp. 191-197, 2017.
  • F. Van Den Bergh, “An Analysis of Particle Swarm Optimizers (PSO)”, PhD Dissertation, Faculty of Natural and Agricultural Science, University of Pretoria, 2001.
  • A. Asuncion and D. Newman, “UCI Machine Learning Repository”, Available at: http://www.ics.uci.edu/~mlearn/ MLRepository.html, Accessed on 2007.
  • C.J. Van Rijsbergen, “Information Retrieval”, PhD Dissertation, Department of Computer Science, University of Glasgow, 1979.
  • W.M. Rand, “Objective Criteria for the Evaluation of Clustering Methods”, Journal of the American Statistical association, Vol. 66, No. 336, pp. 846-850, 1971.

Abstract Views: 197

PDF Views: 4




  • Clustering Categorical Data Using K-Modes Based on Cuckoo Search Optimization Algorithm

Abstract Views: 197  |  PDF Views: 4

Authors

K. Lakshmi
Department of Computer Applications, Kongu Engineering College, India
N. Karthikeyani Visalakshi
Department of Computer Science, NKR Government Arts College for Women, India
S. Shanthi
Department of Computer Applications, Kongu Engineering College, India
S. Parvathavarthini
Department of Computer Technology, Kongu Engineering College, India

Abstract


Cluster analysis is the unsupervised learning technique that finds the interesting patterns in the data objects without knowing class labels. Most of the real world dataset consists of categorical data. For example, social media analysis may have the categorical data like the gender as male or female. The k-modes clustering algorithm is the most widely used to group the categorical data, because it is easy to implement and efficient to handle the large amount of data. However, due to its random selection of initial centroids, it provides the local optimum solution. There are number of optimization algorithms are developed to obtain global optimum solution. Cuckoo Search algorithm is the population based metaheuristic optimization algorithms to provide the global optimum solution. Methods: In this paper, k-modes clustering algorithm is combined with Cuckoo Search algorithm to obtain the global optimum solution. Results: Experiments are conducted with benchmark datasets and the results are compared with k-modes and Particle Swarm Optimization with k-modes to prove the efficiency of the proposed algorithm.

Keywords


Cluster Analysis, k-Modes, Cuckoo Search Optimization, Local Optima, Initial Centroids.

References