Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Clustering and Classifying Diabetic Data Sets Using K-means Algorithm


Affiliations
1 Department of Computer Applications, Bannari Amman Institute of Technology, Sathyamangalam, Tamil Nadu., India
2 Department of Computer Science and Engineering, Bannari Amman Institute of Technology, Sathyamangalam, Tamil Nadu., India
     

   Subscribe/Renew Journal


The k-means algorithm is well known for its efficiency in clustering large data sets. However, working only on numeric values prohibits it from being used to cluster real world data containing categorical values. In this paper we present the Classification of diabetic's data set and the k-means algorithm to categorical domains. Before classify the data set preprocessing of data set is done to remove the noise in the data set. We use the missing value algorithm to replace the null values in the data set. This algorithm is also used to improve the classification rate and cluster the data set using two attributes namely plasma and pregnancy attribute.

Keywords

Classification, Cluster Analysis, Clustering Algorithms, Categorical Data, Pre-processing
Subscription Login to verify subscription
User
Notifications
Font Size


  • Huang, Z. (1998). Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values, Data Mining and Knowledge Discovery, 2, 283-304.
  • Mitchell, T. (1997). Decision Tree Learning (52-78). McGraw-Hill Companies, Inc.
  • Yasodha, P. & Kannan, M. (2011). Analysis of a population of diabetic patients databases in Weka tool. Proceedings of the International Journal of Scientific & Engineering Research, 2(5).
  • Editorial, Diagnosis and Classification of Diabetes Mellitus, American Diabetes Association, Diabetes Care. (2004). 27(1).
  • Karegowda, A. G., Punya, V., Manjunath, A. S. & Jayaram, M. A. (2012). Rule based classification for diabetic patients using cascaded K-means and decision tree C4.5. International Journal of Computer Applications, 45(12), (0975-8887).
  • Karegowda, A. G., Jayaram, M. A. & Manjunath, A. S. (2012). Cascading K-means clustering and K-nearest neighbor classifier for categorization of diabetic patients. International Journal of Engineering and Advanced Technology, 1(3).
  • Wu, C., Steinbauer, J. R. & Kuo, G. M. (2005). EM Clustering Analysis of Diabetes Patients Basic Diagnosis Index. Articles from AMIA Annual Symposium Proceedings are provided here courtesy of American Medical Informatics Association.
  • Maseri, W., Mohd, W., Herawan, T. & Ahmad, N. (2013). Applying Variable Precision Rough Set for Clustering Diabetics Dataset. In: AST2013 and Soft-tech 2013 International Conference.
  • Vijayalakshmi, D. & Thilagavathi, K. (2012). An Approach for Prediction of Diabetic Disease by Using b-Colouring Technique in Clustering Analysis. Proceedings of International Journal of Applied Mathematical Research, 1(4), 520-530.

Abstract Views: 871

PDF Views: 6




  • Clustering and Classifying Diabetic Data Sets Using K-means Algorithm

Abstract Views: 871  |  PDF Views: 6

Authors

M. Kothainayaki
Department of Computer Applications, Bannari Amman Institute of Technology, Sathyamangalam, Tamil Nadu., India
P. Thangaraj
Department of Computer Science and Engineering, Bannari Amman Institute of Technology, Sathyamangalam, Tamil Nadu., India

Abstract


The k-means algorithm is well known for its efficiency in clustering large data sets. However, working only on numeric values prohibits it from being used to cluster real world data containing categorical values. In this paper we present the Classification of diabetic's data set and the k-means algorithm to categorical domains. Before classify the data set preprocessing of data set is done to remove the noise in the data set. We use the missing value algorithm to replace the null values in the data set. This algorithm is also used to improve the classification rate and cluster the data set using two attributes namely plasma and pregnancy attribute.

Keywords


Classification, Cluster Analysis, Clustering Algorithms, Categorical Data, Pre-processing

References