Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Optimizing Classification of High Dimensional Data by Hybrid Approach of Feature Selection with Wrapper Evaluators


Affiliations
1 Department of Computer Science & Engineering, Nirma Institute of Technology, Nirma University, Gujarat, India
2 Department of Computer Engineering, Kalol Institute of Technology & Research Centre, Gujarat, India
     

   Subscribe/Renew Journal


High dimensional data contains large number of features (predictor attributes) compared to number of samples. As many of these features are irrelevant with class label, if any classification algorithm is directly applied on this dataset then model come out will be less accurate and will take much time for building, testing and applying on unseen data. Feature selection methods will select only those features which are relevant to class label. During feature selection procedure, set of features are generated and evaluated for its relevance with class. There are several methods proposed in literature for generation and evaluation of features. Each method has its own characteristic. In this paper experiment is carried out on three types of cancer gene expression datasets with different feature selection methods. Features are generated by ranker, heuristic and random search methods while they are evaluated by information gain, attreval and wrapper methods. A hybrid approach which combines ranker and subset based feature generation is also proposed. It shows that hybrid approach with wrapper evaluator gives best classification accuracy.

Keywords

Data Mining, Classification, Feature Selection, Wrapper Evaluators.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 149

PDF Views: 2




  • Optimizing Classification of High Dimensional Data by Hybrid Approach of Feature Selection with Wrapper Evaluators

Abstract Views: 149  |  PDF Views: 2

Authors

Sanjay Garg
Department of Computer Science & Engineering, Nirma Institute of Technology, Nirma University, Gujarat, India
Mahesh Panchal
Department of Computer Engineering, Kalol Institute of Technology & Research Centre, Gujarat, India

Abstract


High dimensional data contains large number of features (predictor attributes) compared to number of samples. As many of these features are irrelevant with class label, if any classification algorithm is directly applied on this dataset then model come out will be less accurate and will take much time for building, testing and applying on unseen data. Feature selection methods will select only those features which are relevant to class label. During feature selection procedure, set of features are generated and evaluated for its relevance with class. There are several methods proposed in literature for generation and evaluation of features. Each method has its own characteristic. In this paper experiment is carried out on three types of cancer gene expression datasets with different feature selection methods. Features are generated by ranker, heuristic and random search methods while they are evaluated by information gain, attreval and wrapper methods. A hybrid approach which combines ranker and subset based feature generation is also proposed. It shows that hybrid approach with wrapper evaluator gives best classification accuracy.

Keywords


Data Mining, Classification, Feature Selection, Wrapper Evaluators.