Open Access Open Access  Restricted Access Subscription Access

An Approach to overcome Imbalance Datasets of Eukaryotic Genomes during the Analysis by Machine Learning Technique (SVM)


Affiliations
1 Department of Plant Sciences, MJP Rohilkhand University Bareilly (U.P), India
2 Bioinformatics Centre, IVRI, Izatnagar, Bareilly-243122 (U.P), India
 

In biology, Support Vector Machines (SVM) is most frequently used tool for the analysis of gene expression, microarray experiments and other biological applications. In human genome dataset, only a small proportion of the DNA sequences represent genes, and the rest do not. In our work, we highlighted the reasons why, particular SVM, fails and what can be done to overcome this.

Keywords

Imbalanced Dataset, BioSVM
User

  • Akbani R, Kwek S and Japkowicz N (2004) Applying support vector machines to imbalanced datasets. Proc. 15th Eur. Conf. on Machine Learning (ECML). Pisa, Italy, Sept., Springer-Verlag, Germany. pp: 39- 50.
  • Chawla N, Bowyer K, Hall L and Kegelmeyer W (2002) SMOTE: Synthetic Minority Over-sampling Technique. J. Artificial Intelligence Res. 16, 321-357.
  • Cristianini N and Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge, UK. ISBN 0521780195.
  • Joachims T (1998) Text categorization with SVM: Learning with many relevant features. Proc. 10th Eur. Conf. on Machine Learning (ECML).
  • Kozak M (1996) Interpreting cDNA sequences: Some insights from studies on translation. Mammalian Genome. 7, 563-574.
  • Vapnik V (1995) The nature of statistical learning theory. Springer, NY. ISBN 0387987800.
  • Veropoulos K, Campbell C and Cristianini N (1999) Controlling the sensitivity of support vector machines. Proc. Intl. Joint Conf. on AI. pp: 55–60.
  • Wu G and Chang E (2003) Class-Boundary alignment for imbalanced dataset learning. Proc. ICML 2003 Workshop on Learning from Imbalanced Data Sets II, Washington DC, USA.
  • Zeng F, Yap HC and Wong L (2002) Using feature generation and feature selection for accurate prediction of translation initiation sites. Proc. of 13th Workshop on Genome Informatics, Universal Academy Press. pp: 192-200.

Abstract Views: 412

PDF Views: 99




  • An Approach to overcome Imbalance Datasets of Eukaryotic Genomes during the Analysis by Machine Learning Technique (SVM)

Abstract Views: 412  |  PDF Views: 99

Authors

Mohd. Faheem Khan
Department of Plant Sciences, MJP Rohilkhand University Bareilly (U.P), India
Gaurav Chauhan
Bioinformatics Centre, IVRI, Izatnagar, Bareilly-243122 (U.P), India
A. K. Jaitly
Department of Plant Sciences, MJP Rohilkhand University Bareilly (U.P), India

Abstract


In biology, Support Vector Machines (SVM) is most frequently used tool for the analysis of gene expression, microarray experiments and other biological applications. In human genome dataset, only a small proportion of the DNA sequences represent genes, and the rest do not. In our work, we highlighted the reasons why, particular SVM, fails and what can be done to overcome this.

Keywords


Imbalanced Dataset, BioSVM

References





DOI: https://doi.org/10.17485/ijst%2F2011%2Fv4i5%2F30053