Open Access Open Access  Restricted Access Subscription Access

Mining the Amino Acid Dominance in Gene Sequences


Affiliations
1 Department of Information Technology, AMET University, Chennai - 603112, Tamil Nadu, India
2 Manonmaniam Sundaranar University, Tirunelveli - 627012, Tamil Nadu, India
 

In the recent period, the classification techniques are widely applied in the field of Bioinformatics. The proposed Amino Acid Component based Classification algorithm adopts Iterative Dichotomiser3 classifier. The algorithm consists of two phases viz. attribute selection and component based classification. In the attribute selection phase the dominating amino acids and deficiencies in amino acids that cause the diseases are found. The second phase finds the components of amino acids which spread the diseases in the specified sequence. The experiments were carried out on the gene sequence of dengue virus which is available on the NCBI online biological database and the accuracy of the proposed algorithm is calculated as 90.744%. The proposed classification algorithm is compared with the traditional benchmark algorithms such as Naive Bayes, ID3, Random Forest, Multilayer Perceptron and J48. The result of this work can be used by the drug designers to predict new viral diseases.

Keywords

Amino Acid Components, Classification, Entropy, Information Gain
User

Abstract Views: 205

PDF Views: 0




  • Mining the Amino Acid Dominance in Gene Sequences

Abstract Views: 205  |  PDF Views: 0

Authors

V. Balamurugan
Department of Information Technology, AMET University, Chennai - 603112, Tamil Nadu, India
T. Marimuthu
Manonmaniam Sundaranar University, Tirunelveli - 627012, Tamil Nadu, India

Abstract


In the recent period, the classification techniques are widely applied in the field of Bioinformatics. The proposed Amino Acid Component based Classification algorithm adopts Iterative Dichotomiser3 classifier. The algorithm consists of two phases viz. attribute selection and component based classification. In the attribute selection phase the dominating amino acids and deficiencies in amino acids that cause the diseases are found. The second phase finds the components of amino acids which spread the diseases in the specified sequence. The experiments were carried out on the gene sequence of dengue virus which is available on the NCBI online biological database and the accuracy of the proposed algorithm is calculated as 90.744%. The proposed classification algorithm is compared with the traditional benchmark algorithms such as Naive Bayes, ID3, Random Forest, Multilayer Perceptron and J48. The result of this work can be used by the drug designers to predict new viral diseases.

Keywords


Amino Acid Components, Classification, Entropy, Information Gain



DOI: https://doi.org/10.17485/ijst%2F2015%2Fv8i14%2F75241