Open Access Open Access  Restricted Access Subscription Access

Gene Selection Using Information Theory and Statistical Approach


Affiliations
1 Computer Applications, Institute of Technical Education and Research, Siksha O Anusandhan University, Bhubaneswar, Odisha, India
2 Computer Science & Engineering, Institute of Technical Education and Research, Siksha O Anusandhan University, Bhubaneswar, Odisha, India
 

This paper focuses on a methodological framework for gene selection by two approaches such as statistical approach and information based approach. Statistical measures are univariate measures where the gene relevance score of each gene is calculated without considering its co-relation (positive co-relation or negative co-relation) with other genes. Statistical approach includes Euclidian distance and Pearson co-relation. Mutual information is the measure of mutual dependence between two random variables in the case of probability theory. Information based approach includes information gain and dynamic relevance. In this paper the above gene selection methods are applied on four publicly available data sets such as, breast cancer, leukemia, hepatitis and dermatology to generate the subset of genes. Then, the resultant subset is fed through two classifiers namely Naive-Bayes and Support Vector Machine (SVM). Here also the data sets are directly applied to the classifier without applying the gene selection methods. Finally when we have compared the result, it has been found that all the data sets showing better accuracy when the classifiers are applied after gene selection technique which reflects the importance of gene selection technique.

Keywords

Information Based Approach, Naive Bayes, Statistical Approach, Support Vector Machine (SVM).
User

Abstract Views: 189

PDF Views: 0




  • Gene Selection Using Information Theory and Statistical Approach

Abstract Views: 189  |  PDF Views: 0

Authors

Kaberi Das
Computer Applications, Institute of Technical Education and Research, Siksha O Anusandhan University, Bhubaneswar, Odisha, India
Jagannath Ray
Computer Applications, Institute of Technical Education and Research, Siksha O Anusandhan University, Bhubaneswar, Odisha, India
Debahuti Mishra
Computer Science & Engineering, Institute of Technical Education and Research, Siksha O Anusandhan University, Bhubaneswar, Odisha, India

Abstract


This paper focuses on a methodological framework for gene selection by two approaches such as statistical approach and information based approach. Statistical measures are univariate measures where the gene relevance score of each gene is calculated without considering its co-relation (positive co-relation or negative co-relation) with other genes. Statistical approach includes Euclidian distance and Pearson co-relation. Mutual information is the measure of mutual dependence between two random variables in the case of probability theory. Information based approach includes information gain and dynamic relevance. In this paper the above gene selection methods are applied on four publicly available data sets such as, breast cancer, leukemia, hepatitis and dermatology to generate the subset of genes. Then, the resultant subset is fed through two classifiers namely Naive-Bayes and Support Vector Machine (SVM). Here also the data sets are directly applied to the classifier without applying the gene selection methods. Finally when we have compared the result, it has been found that all the data sets showing better accuracy when the classifiers are applied after gene selection technique which reflects the importance of gene selection technique.

Keywords


Information Based Approach, Naive Bayes, Statistical Approach, Support Vector Machine (SVM).



DOI: https://doi.org/10.17485/ijst%2F2015%2Fv8i8%2F67430