Feature Selection for Text Clustering and Classification

Kamlesh Dhayal; Sudesh Kumar; Shalini Batra

Feature Selection for Text Clustering and Classification

Kamlesh Dhayal ¹, Sudesh Kumar ¹, Shalini Batra ²

Affiliations
1 Thapar University, Patiala, India
2 CSED, Thapar University, Patiala, India

Subscribe/Renew Journal

Abstract
References
Article Metrics
Refbacks

The quality of the data is one of the most important factors influencing the performance of any classification or clustering algorithm. The attributes defining the feature space of a given data set can often be inadequate, which make it difficult to discover useful information or desired output. However, even when the original attributes are individually inadequate, it is often possible to combine such attributes in order to construct new ones with greater predictive power. Feature selection, as a preprocessing step to machine learning, has been very effective in reducing dimensionality, removing irrelevant data, and noise from data to improving result comprehensibility. This paper addresses the task of feature selection for clustering and classification. Here we give a comparative study of variety of classification methods, including Naive Bayes, J48 etc.