Open Access Open Access  Restricted Access Subscription Access

High Dimensional Classification - An Overview


Affiliations
1 VIT University, Vellore – 632014, Tamil Nadu, India
 

Objective: A comprehensive overview of high dimensional data classification techniques is presented for the benefit of researchers, scientists and data engineers in both government and private sectors working on large dimensional data. Methods/Statistical Analysis: A systematic approach was followed by studying and reporting the literature review for the years 1969-2016. Findings: The high dimensional data classification is found to be a challenging task as the data will not fit into main memory as required by conventional classification methods. Many of the features would be irrelevant and as the dimensionality increases with limited number of samples any conventional supervised learning algorithm may over fit to noise. The present study reveals the methods to generate artificial samples to increase the size of training data for better classification performance. It is also noted that reducing dimensionality not only reduces the storage space and computational time but increases the understandability. Applications: Text Classification, Email Classification, Pattern Classification, Information Retrieval, Gene Expression Analysis, Health Care Analysis, Predictive Modelling.

Keywords

Dimensionality Reduction, Feature Selection, GA, LSA, PCA, Synthetic Pattern Generation.
User

Abstract Views: 123

PDF Views: 0




  • High Dimensional Classification - An Overview

Abstract Views: 123  |  PDF Views: 0

Authors

Seetha Hari
VIT University, Vellore – 632014, Tamil Nadu, India
Lydia Jane Gnanasigamani
VIT University, Vellore – 632014, Tamil Nadu, India
Lijo Vellaplakkal Paulose
VIT University, Vellore – 632014, Tamil Nadu, India

Abstract


Objective: A comprehensive overview of high dimensional data classification techniques is presented for the benefit of researchers, scientists and data engineers in both government and private sectors working on large dimensional data. Methods/Statistical Analysis: A systematic approach was followed by studying and reporting the literature review for the years 1969-2016. Findings: The high dimensional data classification is found to be a challenging task as the data will not fit into main memory as required by conventional classification methods. Many of the features would be irrelevant and as the dimensionality increases with limited number of samples any conventional supervised learning algorithm may over fit to noise. The present study reveals the methods to generate artificial samples to increase the size of training data for better classification performance. It is also noted that reducing dimensionality not only reduces the storage space and computational time but increases the understandability. Applications: Text Classification, Email Classification, Pattern Classification, Information Retrieval, Gene Expression Analysis, Health Care Analysis, Predictive Modelling.

Keywords


Dimensionality Reduction, Feature Selection, GA, LSA, PCA, Synthetic Pattern Generation.



DOI: https://doi.org/10.17485/ijst%2F2017%2Fv10i25%2F156511