Open Access Open Access  Restricted Access Subscription Access

Performance Analysis of Data Mining Classification Algorithm to Predict Diabetes


Affiliations
1 Department of Computer Science & Engineering, Kathmandu University, Dhulikhel, Nepal, India
 

In Data mining, Classification and prediction are the two very essential forms of data analysis. They are widely used for extracting models for describing important data classes. This paper aims in designing classifier models based on five different classification algorithms namely, Decision Tree, K-Nearest Neighbors (KNN), Naive Bayes, Random Forest and Support Vector Machines (SVM), to classify and predict patients with diabetes. These classifiers are experimented with 10 fold Cross Validation and their performances are evaluated by computing Accuracy, Precision, F-Score, Recall and ROC measures. The test experiment shows that the accuracy given by classifier models developed by using Decision Tree, KNN, Naïve Bayes, SVM and Random Forest are 73.82%, 71.65%, 76.30%, 65.10% and 68.74 % respectively. Similarly, their precisions and recall are 0.705, 0.552, 0.759, 0.424, 0.538 and 0.738, 0.763, 0.82, 0.651, 0.804 respectively. Thus, this study shows that the Naïve Bayes algorithm provides the better accuracy in predicting diabetes as compared to other techniques. And, the data set chosen for this study is “Pima Indian Diabetic Dataset” taken from University of California, Irvine (UCI) Repository of Machine Learning databases.

Keywords

Data Mining, Diabetes, Classification, Prediction, KNN, Naive Bayes, Random Forest, SVM, Accuracy, Precision, F-Measure, Recall.
User
Notifications
Font Size

  • . Han, J., Kamber, M. Data Mining Concepts and Techniques, 3rd ed, 2012, 443-491.
  • . D.Rubben, Jr.Canals, Data mining in health care: current applications and issues, 2009. Available online at: http://mines.humanoriented.com/classes/2010/fall/c sci568/papers/Data_Mining_Health.pdf.
  • . Iyer, A., S, J., Sumbaly, R., diagnosis of diabetes using classification mining techniques, International Journal of Data Mining & Knowledge Management, Process 5, 2015, 1–14.
  • . Joshi, A., Kaur, R., A review: comparative study of various clustering techniques in data mining, International Journal of Advanced Research in Computer Science and Software Engineering, 3, 2013, 443-491.
  • . Pradhan, M.,Sahu R. K., Predict the onset of diabetes disease using artificial neural network, International Journal of Computer Science & Emerging Technologies, 2(2), 2011, 303-311.
  • . Orabi, K.M., Kamal, Y.M., Rabah, T.M., Early predictive system for diabetes mellitus disease, Industrial Conference on Data Mining, Springer, 2016, 420 – 427.
  • . Santhanam, T., &Padmavathi, M. S., Application of K-means and genetic algorithms for dimension reduction by integrating SVM for diabetes diagnosis”, Procedia Computer Science, 47, 2015, 76-83.
  • . Nongyao, N., Moungmai, R. Comparison of classifiers for the risk of diabetes prediction, Procedia Computer Science 69, 2015, 132-142.
  • . Kumar, D.A., Govindasamy, R., Performance and evaluation of classification data mining techniques in diabetes, International Journal of Computer Science and Information Technologies, 6, 2015, 1312–1319.
  • . Rashid Tarik A., S.M.A., Abdullah, R.M., An Intelligent Approach for Diabetes Classification, Prediction and Description, Advances in Intelligent Systems and Computing,424, 2016, 323–335.
  • . Sudhakavya B.V., Senthil, S., Classification Algorithm in Data Mining, International Journal of Advanced Networking & Applications, 10(5), 2019, 18-21,
  • . Deepa, B.G., Senthil S., Singh P. Data Mining on Classifiers Prophecy of Breast Cancer Tissues, International Journal of Advanced Networking & Applications, 10(5), 2019, 8-12.

Abstract Views: 171

PDF Views: 1




  • Performance Analysis of Data Mining Classification Algorithm to Predict Diabetes

Abstract Views: 171  |  PDF Views: 1

Authors

Gajendra Sharma
Department of Computer Science & Engineering, Kathmandu University, Dhulikhel, Nepal, India
Umesh Hengaju
Department of Computer Science & Engineering, Kathmandu University, Dhulikhel, Nepal, India

Abstract


In Data mining, Classification and prediction are the two very essential forms of data analysis. They are widely used for extracting models for describing important data classes. This paper aims in designing classifier models based on five different classification algorithms namely, Decision Tree, K-Nearest Neighbors (KNN), Naive Bayes, Random Forest and Support Vector Machines (SVM), to classify and predict patients with diabetes. These classifiers are experimented with 10 fold Cross Validation and their performances are evaluated by computing Accuracy, Precision, F-Score, Recall and ROC measures. The test experiment shows that the accuracy given by classifier models developed by using Decision Tree, KNN, Naïve Bayes, SVM and Random Forest are 73.82%, 71.65%, 76.30%, 65.10% and 68.74 % respectively. Similarly, their precisions and recall are 0.705, 0.552, 0.759, 0.424, 0.538 and 0.738, 0.763, 0.82, 0.651, 0.804 respectively. Thus, this study shows that the Naïve Bayes algorithm provides the better accuracy in predicting diabetes as compared to other techniques. And, the data set chosen for this study is “Pima Indian Diabetic Dataset” taken from University of California, Irvine (UCI) Repository of Machine Learning databases.

Keywords


Data Mining, Diabetes, Classification, Prediction, KNN, Naive Bayes, Random Forest, SVM, Accuracy, Precision, F-Measure, Recall.

References