Open Access Open Access  Restricted Access Subscription Access

Performance Comparison of Classifiers using Clinical Observations in Reject Option Scenarios to Detect Cancerous Subjects


Affiliations
1 Department of CS and IT, University of Azad Jammu and Kashmir, Muzaffarabad, AJK, Pakistan
2 Department of Mathematics, University of Azad Jammu and Kashmir, Muzaffarabad, AJK, Pakistan
 

Objective: To compare the performance of classifiers in different reject option scenarios using Accuracy Rejection Curves (ARCs) with clinical cancer datasets instead of microarray datasets (having curse of dimensionality). Methods/Statistical Analysis: In this work, six publicly available clinical cancer data sets were used to build predictive classifiers and10 time’s 10-fold cross validation technique is used to have generalized models. Accuracy Rejection Curves (ARCs) are used to compare the performance of classifiers. Findings: In literature, the performance of machine learning algorithms (classifiers) were compared using microarray data in reject option scenarios for cancer like problems. Microarray data is costly and suffer curse of dimensionality which reduces the performance of classifier. In this work we used clinical cancer datasets instead of microarray data. Microarray data is costly and not commonly available. It contains thousands of gene expressions which needs more time and effort to preprocess before using in predictive analysis. On the other side clinical data is cheap and commonly available. It does not require further preprocessing work. We used clinical cancer datasets and analyzed the performance of classifiers in reject option scenarios using ARCs. Classifiers perform differently at different rejection levels and hence make crossing over (helical like structure) or divergent ARCs which shows that a classifier-A (which is less accurate as compared to other) outperforms another classifier (say B) on having reject option. Majority of crossing over ARCs shows improvement in the performance of the classifiers due to inclusion of reject option. Empirical results show that clinical data can be effectively used (as compared to microarray data) for the comparison of classification/prediction purposes and for the selection of suitable classifiers in Reject Option scenarios. Improvements/Applications: Empirical results show that clinical parameters also give promising results in the prediction of cancer using reject option classifiers as compared to those results obtained through gene expression microarray data and reject option classifiers.
User

  • Somasundaram S, Alli P. A machine learning ensemble classifier for early prediction of diabetic retinopathy. Journal of Medical Systems. 2017; 41(12):201. https://doi.org/10.1007/s10916-017-0853-x. PMid:29124453
  • Chen M, Hao Y, Hwang K, Wang L, Wang L. Disease prediction by machine learning over big data from healthcare communities. IEEE Access. 2017; 5:8869–79. https://doi.org/10.1109/ACCESS.2017.2694446.
  • Vejjanugraha P, Kongprawechnon W, Kondo T, Tungpimolrut K, Kotani K. An automatic screening method for primary open-angle glaucoma assessment using binary and multi-class support vector machines. ScienceAsia. 2017; 43(4):229–39. https://doi.org/10.2306/scienceasia1513-1874.2017.43.229.
  • Sathyaa J, Geetha K. Mass classification in breast DCE-MR images using an artificial neural network trained via a bee colony optimization algorithm. ScienceAsia. 2013; 39(3):294–305. https://doi.org/10.2306/scienceasia1513-1874.2013.39.294.
  • Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics. 2000; 16(10):906–14. https://doi.org/10.1093/bioinformatics/16.10.906. PMid:11120680
  • Dudoit S, Fridlyand J, Speed TP. Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American statistical association. 2002; 97(457):77–87. https://doi.org/10.1198/016214502753479248.
  • Breiman L. Random forests. Machine Learning. 2001; 45(1):5–32. https://doi.org/10.1023/A:1010933404324.
  • Chow. On optimum recognition error and reject tradeoff. IEEE Transactions on Information Theory. 1970; 16(1): 41–6. https://doi.org/10.1109/TIT.1970.1054406.
  • Hanczar B, Dougherty ER. Classification with reject option in gene expression data. Bioinformatics. 2008; 24(17): 1889–95. https://doi.org/10.1093/bioinformatics/btn349. PMid:18621758
  • Van De Vijver MJ, He YD, Van’t Veer LJ, Dai H, Hart AA, Voskuil DW, et al. A gene-expression signature as a predictor of survival in breast cancer. New England Journal of Medicine. 2002; 347(25):1999–2009. https://doi.org/10.1056/NEJMoa021967. PMid:12490681
  • Braga-Neto UM, Dougherty ER. Is cross-validation valid for small-sample microarray classification? Bioinformatics. 2004; 20(3):374–80. https://doi.org/10.1093/bioinformatics/ btg419. PMid:14960464
  • Dougherty MK, Morrison DK. Unlocking the code of 14-3-3. Journal of Cell Science. 2004; 117(10):1875–84. https://doi.org/10.1242/jcs.01171. PMid:15090593
  • Koscielny S. Why most gene expression signatures of tumors have not been useful in the clinic. Science Translational Medicine. 2010; 2(14):14ps2.
  • Michiels S, Koscielny S, Hill C. Prediction of cancer outcome with microarrays: a multiple random validation strategy. The Lancet. 2005; 365(9458):488–92. https://doi.org/10.1016/S0140-6736(05)17866-0.
  • Ghorai S, Mukherjee A, Sengupta S, Dutta PK. Cancer classification from gene expression data by NPPC ensemble. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2011; 8(3):659–71. https://doi.org/10.1109/TCBB.2010.36. PMid:20479504
  • Huang J, Fang H, Fan X. Decision forest for classification of gene expression data. Computers in biology and medicine. 2010; 40(8):698–704. https://doi.org/10.1016/j.compbiomed.2010.06.004. PMid:20591424
  • Tan AC, Gilbert D. Ensemble machine learning on gene expression data for cancer classification. Applied Bioinformatics. 2003; 2(3):S75–83. PMid:15130820
  • Quinlan JR. Improved use of continuous attributes in C4.5. Journal of Artificial Intelligence Research. 1996; 4:77–90. https://doi.org/10.1613/jair.279.
  • Bennett KP, Blue JA, editors. A support vector machine approach to decision trees. IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227). 1998; 3:2396–401.
  • Pena-Reyes CA, Sipper M. A fuzzy-genetic approach to breast cancer diagnosis. Artificial Intelligence in Medicine. 1999; 17(2):131–55. https://doi.org/10.1016/S0933-3657(99)00019-6.
  • Setiono R. Generating concise and accurate classification rules for breast cancer diagnosis. Artificial Intelligence in Medicine. 2000; 18(3):205–19. https://doi.org/10.1016/S0933-3657(99)00041-X.
  • Hamilton HJ, Shan N, Cercone N. RIAC: A rule induction algorithm based on approximate classification. Department of Computer Science University of Regina Regina, Saskatchewan, CANADA; 1996. p. 1–21. PMid:8938925
  • Šter B, Dobnikar A, editors. Neural networks in medical diagnosis: Comparison with other methods. International Conference on Engineering Applications of Neural Networks; 1996. p. 427–30.
  • Nauck D, Kruse R. Obtaining interpretable fuzzy classification rules from medical data. Artificial Intelligence in Medicine. 1999; 16(2):149–69. https://doi.org/10.1016/S0933-3657(98)00070-0.
  • Abonyi J, Szeifert F. Supervised fuzzy clustering for the identification of fuzzy classifiers. Pattern Recognition Letters. 2003; 24(14):2195–207. https://doi.org/10.1016/S0167-8655(03)00047-3.
  • Goodman DE, Boggess L, Watkins A. Artificial immune system classification of multiple-class problems. Proceedings of the Artificial Neural Networks in Engineering ANNIE. 2002; 2:179–83.
  • Bache K, Lichman M. UCI machine learning repository. University of California, School of Information and Computer Science; 2013.
  • Loprinzi CL, Laurie JA, Wieand HS, Krook JE, Novotny PJ, Kugler JW. Prospective evaluation of prognostic variables from patient-completed questionnaires. North Central Cancer Treatment Group. Journal of Clinical Oncology. 1994; 12(3):601–7. https://doi.org/10.1200/JCO.1994.12.3.601. PMid:8120560
  • Kalbfleisch J, Prentice R. Veteran’s Administration Lung Cancer Trial; 1980.
  • Wolberg DWH. Wisconsin Breast Cancer Database; 1991.
  • William H. Wolberg WNS, Olvi L. Mangasarian. Wisconsin Diagnostic Breast Cancer (WDBC); 1995.
  • Zwitter M, Soklic M. Breast-cancer dataset. Date Obtained from the University Medical Centre, Institute of Oncology, Ljubljana Yugoslavia; 1988. PMid:3285104
  • Loprinzi CL LJ, Wieand HS, Krook JE, Novotny PJ, Kugler JW, Bartel J, Law M, Bateman M, Klatt NE. NCCTG Lung Cancer Data. Therneau T, editor; 1994.
  • Kalbfleisch JD. PRL. Veteran’s Administration Lung Cancer Trial; 1980.
  • Nadeem MSA, Zucker J-D, Hanczar B. Accuracy-Rejection Curves (ARCs) for comparing classification methods with a reject option. Proceedings of Machine Learning Research. 2010; 8:65–81.
  • Chow CK. An optimum character recognition system using decision functions. IRE Transactions on Electronic Computers. 1957; 6(4):247–54. https://doi.org/10.1109/TEC.1957.5222035.
  • Friedel CC, Rückert U, Kramer S. Cost curves for abstaining classifiers. Proceedings of the ICML, Workshop on ROC Analysis in Machine Learning; 2006. p. 33–4.
  • Landgrebe TC, Tax DM, Paclík P, Duin RP. The interaction between classification and reject performance for distancebased reject-option classifiers. Pattern Recognition Letters. 2006; 27(8):908–17. https://doi.org/10.1016/j.patrec.2005.10.015.
  • Dubuisson B, Masson M. A statistical decision rule with incomplete knowledge about classes. Pattern Recognition. 1993; 26(1):155–65. https://doi.org/10.1016/0031-3203(93)90097-G.
  • Fan R-E, Chang K-W, Hsieh C-J, Wang X-R, Lin C-J. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research. 2008; 9:1871–4.
  • Wolberg DWH, Street WN, Mangasarian. OL. Wisconsin Breast Cancer Database; 1995.
  • Zwitter M, Soklic M. Institute of Oncology University Medical Center, Ljubljana Yugoslavia. Donors: Ming Tan and Jeff Schlimmer; 1988.

Abstract Views: 230

PDF Views: 0




  • Performance Comparison of Classifiers using Clinical Observations in Reject Option Scenarios to Detect Cancerous Subjects

Abstract Views: 230  |  PDF Views: 0

Authors

Muhammad Rehan Abbas
Department of CS and IT, University of Azad Jammu and Kashmir, Muzaffarabad, AJK, Pakistan
Malik Sajjad Ahmed Nadeem
Department of CS and IT, University of Azad Jammu and Kashmir, Muzaffarabad, AJK, Pakistan
Wajid Aziz
Department of CS and IT, University of Azad Jammu and Kashmir, Muzaffarabad, AJK, Pakistan
Aliya Shaheen
Department of Mathematics, University of Azad Jammu and Kashmir, Muzaffarabad, AJK, Pakistan
Sharjeel Saeed
Department of CS and IT, University of Azad Jammu and Kashmir, Muzaffarabad, AJK, Pakistan

Abstract


Objective: To compare the performance of classifiers in different reject option scenarios using Accuracy Rejection Curves (ARCs) with clinical cancer datasets instead of microarray datasets (having curse of dimensionality). Methods/Statistical Analysis: In this work, six publicly available clinical cancer data sets were used to build predictive classifiers and10 time’s 10-fold cross validation technique is used to have generalized models. Accuracy Rejection Curves (ARCs) are used to compare the performance of classifiers. Findings: In literature, the performance of machine learning algorithms (classifiers) were compared using microarray data in reject option scenarios for cancer like problems. Microarray data is costly and suffer curse of dimensionality which reduces the performance of classifier. In this work we used clinical cancer datasets instead of microarray data. Microarray data is costly and not commonly available. It contains thousands of gene expressions which needs more time and effort to preprocess before using in predictive analysis. On the other side clinical data is cheap and commonly available. It does not require further preprocessing work. We used clinical cancer datasets and analyzed the performance of classifiers in reject option scenarios using ARCs. Classifiers perform differently at different rejection levels and hence make crossing over (helical like structure) or divergent ARCs which shows that a classifier-A (which is less accurate as compared to other) outperforms another classifier (say B) on having reject option. Majority of crossing over ARCs shows improvement in the performance of the classifiers due to inclusion of reject option. Empirical results show that clinical data can be effectively used (as compared to microarray data) for the comparison of classification/prediction purposes and for the selection of suitable classifiers in Reject Option scenarios. Improvements/Applications: Empirical results show that clinical parameters also give promising results in the prediction of cancer using reject option classifiers as compared to those results obtained through gene expression microarray data and reject option classifiers.

References





DOI: https://doi.org/10.17485/ijst%2F2018%2Fv11i39%2F111033