An Approach to overcome Imbalance Datasets of Eukaryotic Genomes during the Analysis by Machine Learning Technique (SVM)

Mohd. Faheem Khan; Gaurav Chauhan; A. K. Jaitly

doi:10.17485/ijst/2011/v4i5/30053

An Approach to overcome Imbalance Datasets of Eukaryotic Genomes during the Analysis by Machine Learning Technique (SVM)

Mohd. Faheem Khan ¹, Gaurav Chauhan ², A. K. Jaitly ¹

Affiliations
1 Department of Plant Sciences, MJP Rohilkhand University Bareilly (U.P), India
2 Bioinformatics Centre, IVRI, Izatnagar, Bareilly-243122 (U.P), India

Abstract
References
Article Metrics
Refbacks

In biology, Support Vector Machines (SVM) is most frequently used tool for the analysis of gene expression, microarray experiments and other biological applications. In human genome dataset, only a small proportion of the DNA sequences represent genes, and the rest do not. In our work, we highlighted the reasons why, particular SVM, fails and what can be done to overcome this.

Keywords

Imbalanced Dataset, BioSVM

About the Journal

Editorial Board

Current Issue

Archives

Advanced Search

Article Submission

Registration

Subscription

User

Information

Journal Content
Browse

Donations

Akbani R, Kwek S and Japkowicz N (2004) Applying support vector machines to imbalanced datasets. Proc. 15th Eur. Conf. on Machine Learning (ECML). Pisa, Italy, Sept., Springer-Verlag, Germany. pp: 39- 50.

Chawla N, Bowyer K, Hall L and Kegelmeyer W (2002) SMOTE: Synthetic Minority Over-sampling Technique. J. Artificial Intelligence Res. 16, 321-357.

Cristianini N and Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge, UK. ISBN 0521780195.

Joachims T (1998) Text categorization with SVM: Learning with many relevant features. Proc. 10th Eur. Conf. on Machine Learning (ECML).

Kozak M (1996) Interpreting cDNA sequences: Some insights from studies on translation. Mammalian Genome. 7, 563-574.

Vapnik V (1995) The nature of statistical learning theory. Springer, NY. ISBN 0387987800.

Veropoulos K, Campbell C and Cristianini N (1999) Controlling the sensitivity of support vector machines. Proc. Intl. Joint Conf. on AI. pp: 55–60.

Wu G and Chang E (2003) Class-Boundary alignment for imbalanced dataset learning. Proc. ICML 2003 Workshop on Learning from Imbalanced Data Sets II, Washington DC, USA.

Zeng F, Yap HC and Wong L (2002) Using feature generation and feature selection for accurate prediction of translation initiation sites. Proc. of 13th Workshop on Genome Informatics, Universal Academy Press. pp: 192-200.

Abstract Views: 412

PDF Views: 99

Username
Password
Remember me

Username
Password
Remember me

Indian Journal of Science and Technology

An Approach to overcome Imbalance Datasets of Eukaryotic Genomes during the Analysis by Machine Learning Technique (SVM)

Keywords

An Approach to overcome Imbalance Datasets of Eukaryotic Genomes during the Analysis by Machine Learning Technique (SVM)

Authors

Abstract

Keywords

References