Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Implementation of K-Modes Algorithm to Cluster Very Large Categorical Data Sets in Data Mining


Affiliations
1 Malineni Lakshmaiah Engineering College, Singarayakonda, Prakasam Dist., A.P., India
     

   Subscribe/Renew Journal


This paper is mainly related to Data Mining and in particular it is in Clustering. Partitioning a large set of objects into homogeneous groups is a fundamental operation in Data Mining. This process of grouping objects into homogenous groups is called as clustering. In general, K-Means algorithm is used for clustering large data sets in Data Mining but its efficiency is limited to cluster numerical objects only. However, K-Means algorithm working efficiently with numerical values, its use is limited in Data Mining because data sets in Data Mining often contain categorical values. In this paper we present an algorithm called K-Modes algorithm to extend the K-Means paradigm to categorical domains. Here we introduce new dissimilarity measures to deal with categorical objects, replace means of clusters with modes and use a frequency based method to up date modes in the clustering process. Here the WEKA tool is used for the implementation of K-modes algorithm.

Keywords

Categorical Data, Clustering, Data Mining, Dissimilarity Measures, K-Means, K-Modes, Weka Tool.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 162

PDF Views: 2




  • Implementation of K-Modes Algorithm to Cluster Very Large Categorical Data Sets in Data Mining

Abstract Views: 162  |  PDF Views: 2

Authors

K. Sujatha
Malineni Lakshmaiah Engineering College, Singarayakonda, Prakasam Dist., A.P., India

Abstract


This paper is mainly related to Data Mining and in particular it is in Clustering. Partitioning a large set of objects into homogeneous groups is a fundamental operation in Data Mining. This process of grouping objects into homogenous groups is called as clustering. In general, K-Means algorithm is used for clustering large data sets in Data Mining but its efficiency is limited to cluster numerical objects only. However, K-Means algorithm working efficiently with numerical values, its use is limited in Data Mining because data sets in Data Mining often contain categorical values. In this paper we present an algorithm called K-Modes algorithm to extend the K-Means paradigm to categorical domains. Here we introduce new dissimilarity measures to deal with categorical objects, replace means of clusters with modes and use a frequency based method to up date modes in the clustering process. Here the WEKA tool is used for the implementation of K-modes algorithm.

Keywords


Categorical Data, Clustering, Data Mining, Dissimilarity Measures, K-Means, K-Modes, Weka Tool.