Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Ensemble Classification based Microarray Gene Retrieval System


Affiliations
1 Department of Computer Science, St. Pius X College, India
2 Department of Information Technology, Government Arts College, Coimbatore, India
     

   Subscribe/Renew Journal


Data mining plays an important role in the process of classifying between the normal and the cancerous samples by utilizing microarray gene data. As this classification process is related to the human lives, greater sensitivity and specificity rates are mandatory. Taking this challenge into account, this work presents a technique to classify between the normal and cancerous samples by means of efficient feature selection and classification. The process of feature selection is achieved by Information Gain Ratio (IGR) and the selected features are forwarded to the classification process, which is achieved by ensemble classification. The classifiers being employed to attain ensemble classification are k-Nearest Neighbour (k-NN), Support Vector Machine (SVM) and Extreme Learning Machine (ELM). The performance of the proposed approach is analysed with respect to three different datasets such as Leukemia, Colon and Breast cancer in terms of accuracy, sensitivity and specificity. The experimental results prove that the proposed work shows better results, when compared to the existing techniques.

Keywords

Data Mining, Classification, Feature Selection.
Subscription Login to verify subscription
User
Notifications
Font Size

  • D. Coomans and D.L. Massart, “Alternative k-Nearest Neighbour Rules in Supervised Pattern Recognition: Part 1. k-Nearest Neighbour Classification by using Alternative Voting Rules”, Analytica Chimica Acta, Vol. 136, pp. 15-27, 1982.
  • C. Cortes and V. Vapnik, “Support-Vector Networks”, Machine Learning, Vol. 20, No. 3, pp. 273-297, 1995.
  • G.B. Huang, Q.Y. Zhu and C.K. Siew, “Extreme Learning Machine: Theory and Applications”, Neurocomputing, Vol. 70, No. 1, pp. 489-501, 2006.
  • S. Bandyopadhyay, A. Mukhopadhyay and U. Maulik, “An Improved Algorithm for Clustering Gene Expression Data”, Bioinformatics, Vol. 23, No. 21, pp. 2859-2865, 2007.
  • U. Maulik, A. Mukhopadhyay and S. Bandyopadhyay, “Combining Pareto-Optimal Clusters using Supervised Learning for Identifying Coexpressed Genes”, BMC Bioinformatics, Vol. 10, No. 1, pp. 20-27, 2009.
  • A. Mukhopadhyay, S. Bandyopadhyay and U. Maulik, “Multi-Class Clustering of Cancer Subtypes through SVM based Ensemble of Pareto-Optimal Solutions for Gene Marker Identification”, PLoS ONE, Vol. 5, No. 11, pp. 1-8, 2010.
  • U. Maulik and A. Mukhopadhyay, “Simulated Annealing based Automatic Fuzzy Clustering Combined with ANN Classification for Analysing Microarray Data”, Computers and Operations Research, Vol. 37, No. 8, pp. 1369-1380, 2010.
  • A. Mukhopadhyay and U. Maulik, “Towards Improving Fuzzy Clustering using Support Vector Machine: Application to Gene Expression Data”, Pattern Recognition, Vol. 42, No. 11, pp. 2744-2763, 2009.
  • U. Maulik, “Analysis of Gene Microarray Data in a Soft Computing Framework”, Applied Soft Computing, Vol. 11, No. 6, pp. 4152-4160, 2011.
  • Jia Lv, Qinke Peng, Xiao Chen and Zhi Sun, “A Multi-Objective Heuristic Algorithm for Gene Expression Microarray Data Classification”, Expert Systems with Applications, Vol. 59, pp. 13-19, 2016.
  • Shun Guo, Donghui Guo, Lifei Chen and Qingshan Jiang, “A Centroid-based Gene Selection Method for Microarray Data Classification”, Journal of Theoretical Biology, Vol. 400, pp. 32-41, 2016.
  • Hanaa Salem, Gamal Attiya and Nawal El-Fishawy, “Classification of Human Cancer Diseases by Gene Expression Profiles”, Applied Soft Computing, Vol. 50, pp. 124-134, 2017.
  • Sina Tabakhi, Ali Najafi, Reza Ranjbar and Parham Moradi, “Gene Selection for Microarray Data Classification using a Novel Ant Colony Optimization”, Neurocomputing, Vol. 168, pp. 1024-1036, 2015.
  • Ehsan Lotfi and Azita Keshavarz, “Gene Expression Microarray Classification using PCA-BEL”, Computers in Biology and Medicine, Vol. 54, pp. 180-187, 2014.
  • Nur Shazila Mohamed, Suhaila Zainudin and Zulaiha Ali Othman, “Metaheuristic Approach for an Enhanced MRMR Filter Method for classification using Drug Response Microarray Data”, Expert Systems with Applications, Vol. 90, pp. 224-231, 2017.
  • Vicente Garcia and J. Salvador Sanchez, “Mapping Microarray Gene Expression Data into Dissimilarity Spaces for Tumor Classification”, Information Sciences, Vol. 294, pp. 362-375, 2015.
  • Aiguo Wang, Ning An, Guilin Chen, Lian Li and Gil Alterovitz, “Improving PLS–RFE based Gene Selection for Microarray Data Classification”, Computers in Biology and Medicine, Vol. 62, pp. 14-24, 2015.
  • Huijuan Lu, Junying Chen, Ke Yan, Qun Jin and Yu Xue, Zhigang Gao, “A Hybrid Feature Selection Algorithm for Gene Expression Data Classification”, Neurocomputing, Vol. 256, pp. 56-62, 2017.
  • M. Dashtban and Mohammadali Balafar, “Gene Selection for Microarray Cancer Classification using a New Evolutionary Method Employing Artificial Intelligence Concepts”, Genomics, Vol. 109, No. 2, pp. 91-107, 2017.
  • Guang-Bin Huang, Hongming Zhou, Xiaojian Ding and Rui Zhang, “Extreme Learning Machine for Regression and Multiclass Classification’, IEEE Transactions on systems, Man and Cybernetics-Part B, Vol. 42, No. 2, pp. 513-529, 2012.
  • PMC-NCBI-NIH, Available at: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC151171
  • Gene Expression Project, Available at:http://microarray.princeton.edu/oncology
  • Lt. Thomas Scaria and T. Christopher, “Microarray Gene Retrieval System based on LFDA and SVM”, International Journal of Intelligent Systems and Applications, Vol. 1, pp. 9-15, 2018.
  • Lt. Thomas Scaria and T. Christopher, “Supervised Microarray Gene Retrieval System based on KLFDA and ELM”, International Journal of Advanced Intelligent Paradigms, 2018.

Abstract Views: 200

PDF Views: 1




  • Ensemble Classification based Microarray Gene Retrieval System

Abstract Views: 200  |  PDF Views: 1

Authors

Thomas Scaria
Department of Computer Science, St. Pius X College, India
T. Christopher
Department of Information Technology, Government Arts College, Coimbatore, India

Abstract


Data mining plays an important role in the process of classifying between the normal and the cancerous samples by utilizing microarray gene data. As this classification process is related to the human lives, greater sensitivity and specificity rates are mandatory. Taking this challenge into account, this work presents a technique to classify between the normal and cancerous samples by means of efficient feature selection and classification. The process of feature selection is achieved by Information Gain Ratio (IGR) and the selected features are forwarded to the classification process, which is achieved by ensemble classification. The classifiers being employed to attain ensemble classification are k-Nearest Neighbour (k-NN), Support Vector Machine (SVM) and Extreme Learning Machine (ELM). The performance of the proposed approach is analysed with respect to three different datasets such as Leukemia, Colon and Breast cancer in terms of accuracy, sensitivity and specificity. The experimental results prove that the proposed work shows better results, when compared to the existing techniques.

Keywords


Data Mining, Classification, Feature Selection.

References