A Web Mining Application to Classify Bioinformatics Datasets

Rabie Ahmed; Munir Amin; Mohammed Al-Shomrani

doi:10.17485/ijst/2018/v11i4/169720

A Web Mining Application to Classify Bioinformatics Datasets

Rabie Ahmed ¹, Munir Amin ², Mohammed Al-Shomrani ³

Affiliations
1 Department of Computer Science, Faculty of computing & IT, Northern Border University, Egypt
2 Department of Information Technology, Faculty of computing & IT, Northern Border University, Rafha, Saudi Arabia
3 Department of Mathematics, Faculty of Science, King Abdulaziz University, P. O. Box 80203, Jeddah 21589, Saudi Arabia

Abstract
References
Article Metrics
Refbacks

A web application is built for classification bioinformatics datasets. Our application gives an easy and interactive visual interface which will be useful for technical and non-technical users. This application is mainly used for classification bioinformatics datasets, especially multi class large datasets, using sequential and parallel classification algorithms that is hopefully be widespread acceptance and adopted in both academia and business. Biological datasets are applied and classified using both serial as well as parallel support vector machine.Our proposed application has been rewritten entirely from scratch presents a general framework for data preprocessing, classification, and prediction. These three main tasks are applied in different datasets of different size such as Leukemia, Colon-cancer, Breast-cancer, DNA, and Protein. In the preprocessing phase, various types of data preprocessing techniques like Data Cleaning, Data Transformation, Data Reduction, and Data Discretization are used to solve incomplete and/or inconsistent problems in raw data. Then, in classification phase, a classification starts to work on preprocessing data according to different algorithms such as Serial SVM Algorithm, Parallel SVM Algorithm, Clustering, Decision Trees, Genetic Programming, and Bayesian Networks to produce a trained model based on training datasets. Finally, in the prediction phase, the trained model is used to predict the class value of a new instance in a given dataset. In order to establish an efficient and effective prediction model, we have taken into account that our prediction model must have the following criteria Accuracy, Speed, Robustness, and Scalability. The purposed application has shown much promise due to its robust classification capabilities to produce a prediction model with high accuracy ranging from 70.32 % to 97.33 %.