Open Access Open Access  Restricted Access Subscription Access

A Study on Impact of Dimensionality Reduction on Naïve Bayes Classifier


Affiliations
1 Department of Computer Science, Bharathiar University, Coimbatore – 641046, Tamilnadu, India
2 School of Computing Science & Engineering, VIT University, Vellore – 632014, Tamilnadu, India
 

Objectives: The time complexity of the machine learning algorithm is directly proportionate to the dimension of the dataset. In this paper, he impacts of dimensionality of the dataset on the machine learning algorithm, Naïve-Bayes Classifier is evaluated with all feature subsets to analyze whether there is any variations in the performance. Methods/Statistical Analysis: Naïve Bayes Classifier is taken for the study to evaluate its variations in terms of its performance in correctly classified instances and incorrectly classified instances. Pima Indian Type II diabetes dataset is taken for the experimental study. Confusion matrix will be formulated for the performance of Naïve-Bayes Classifier using 10-fold cross validation for each run. The study exhibits the impact of the dimensionality on the performance of Naïve-Bayes Classifier. Findings: The Naïve Bayes classifier classifies the patient records either as diabetes or as non-diabetes using the values of the feature set. It is a probabilistic approach of classifying the patient records into the binary class. It is found that there is an impact on the performance of Naïve Bayes Classifier due to the dimensionality of the feature set it terms of Classification accuracy, number of true positives, true negatives, false positives and false negatives. The incorrect classification is certainly dangerous. Whereas the valid classification facilitates the healthcare systems in terms of planning effective course of treatment which will save the life of the patient. The invalid classification will lead to a wrong diagnosis while formulating the treatment plan and it will lead to loss of life. Hence, the invalid classification in terms of false negative rate is to be viewed very seriously. In this paper, the study shows that there is an impact on the performance of Naïve Bayes Classifier due to the higher dimensionality of the dataset. Application/Improvements: They will be used in medical Informatics for the quality diagnosis and effective treatment planning. The focus on the false positive rate in the classification accuracy of Naïve Bayes Classifier will notably help the healthcare systems to diagnose the patients accurately to save life.

Keywords

Classification Accuracy, Dimensionality Reduction, Machine Learning, Naïve-Bayes Classifier.
User

Abstract Views: 206

PDF Views: 0




  • A Study on Impact of Dimensionality Reduction on Naïve Bayes Classifier

Abstract Views: 206  |  PDF Views: 0

Authors

Priya Mohan
Department of Computer Science, Bharathiar University, Coimbatore – 641046, Tamilnadu, India
Ilango Paramasivam
School of Computing Science & Engineering, VIT University, Vellore – 632014, Tamilnadu, India

Abstract


Objectives: The time complexity of the machine learning algorithm is directly proportionate to the dimension of the dataset. In this paper, he impacts of dimensionality of the dataset on the machine learning algorithm, Naïve-Bayes Classifier is evaluated with all feature subsets to analyze whether there is any variations in the performance. Methods/Statistical Analysis: Naïve Bayes Classifier is taken for the study to evaluate its variations in terms of its performance in correctly classified instances and incorrectly classified instances. Pima Indian Type II diabetes dataset is taken for the experimental study. Confusion matrix will be formulated for the performance of Naïve-Bayes Classifier using 10-fold cross validation for each run. The study exhibits the impact of the dimensionality on the performance of Naïve-Bayes Classifier. Findings: The Naïve Bayes classifier classifies the patient records either as diabetes or as non-diabetes using the values of the feature set. It is a probabilistic approach of classifying the patient records into the binary class. It is found that there is an impact on the performance of Naïve Bayes Classifier due to the dimensionality of the feature set it terms of Classification accuracy, number of true positives, true negatives, false positives and false negatives. The incorrect classification is certainly dangerous. Whereas the valid classification facilitates the healthcare systems in terms of planning effective course of treatment which will save the life of the patient. The invalid classification will lead to a wrong diagnosis while formulating the treatment plan and it will lead to loss of life. Hence, the invalid classification in terms of false negative rate is to be viewed very seriously. In this paper, the study shows that there is an impact on the performance of Naïve Bayes Classifier due to the higher dimensionality of the dataset. Application/Improvements: They will be used in medical Informatics for the quality diagnosis and effective treatment planning. The focus on the false positive rate in the classification accuracy of Naïve Bayes Classifier will notably help the healthcare systems to diagnose the patients accurately to save life.

Keywords


Classification Accuracy, Dimensionality Reduction, Machine Learning, Naïve-Bayes Classifier.



DOI: https://doi.org/10.17485/ijst%2F2017%2Fv10i20%2F156940