Open Access Open Access  Restricted Access Subscription Access

An Empirical Analysis on Effect of Data Expansion for Clustering Low Dimensional Data


Affiliations
1 Computer Science and Information Technology, Institute of Technical Education and Research, Siksha O Anusandhan University, Bhubaneswar – 751003, Odisha, India
2 Computer Science and Engineering, Institute of Technical Education and Research, Siksha O Anusandhan University, Bhubaneswar – 751003, Odisha, India
 

The researchers of the data mining domain presume that the study of traditional clustering techniques is saturating day by day. But, a deep insight into those techniques unfolds many silhouettes which could lead to many more applications in diverged domains. In clustering, the attributes of the data provide the information needed for data segregation. There may exist some real world data with less number of attributes but more information contained in them and may be of interest for some applications. Because of less number of attributes, the data may not be well separated by any of the clustering techniques. Data expansion techniques are methods for constructing more number of attributes from less number of attributes. With the application of these techniques, an expanded data set may be reconstructed from a given data set during data preprocessing. The current work pronounces the fact that, the expanded data at times yield better clustering results than the real data. This paper is an attempt to empirically evaluate and analyze the effects of data expansion on clustering results where validity of the results are established through internal indexing techniques and probabilistic validation measures.

Keywords

Cluster Analysis, Cluster Validity, Data Expansion, Internal Indexing, Probabilistic Measures
User

Abstract Views: 207

PDF Views: 0




  • An Empirical Analysis on Effect of Data Expansion for Clustering Low Dimensional Data

Abstract Views: 207  |  PDF Views: 0

Authors

Smita Prava Mishra
Computer Science and Information Technology, Institute of Technical Education and Research, Siksha O Anusandhan University, Bhubaneswar – 751003, Odisha, India
Debahuti Mishra
Computer Science and Engineering, Institute of Technical Education and Research, Siksha O Anusandhan University, Bhubaneswar – 751003, Odisha, India
Srikanta Patnaik
Computer Science and Engineering, Institute of Technical Education and Research, Siksha O Anusandhan University, Bhubaneswar – 751003, Odisha, India

Abstract


The researchers of the data mining domain presume that the study of traditional clustering techniques is saturating day by day. But, a deep insight into those techniques unfolds many silhouettes which could lead to many more applications in diverged domains. In clustering, the attributes of the data provide the information needed for data segregation. There may exist some real world data with less number of attributes but more information contained in them and may be of interest for some applications. Because of less number of attributes, the data may not be well separated by any of the clustering techniques. Data expansion techniques are methods for constructing more number of attributes from less number of attributes. With the application of these techniques, an expanded data set may be reconstructed from a given data set during data preprocessing. The current work pronounces the fact that, the expanded data at times yield better clustering results than the real data. This paper is an attempt to empirically evaluate and analyze the effects of data expansion on clustering results where validity of the results are established through internal indexing techniques and probabilistic validation measures.

Keywords


Cluster Analysis, Cluster Validity, Data Expansion, Internal Indexing, Probabilistic Measures



DOI: https://doi.org/10.17485/ijst%2F2016%2Fv9i3%2F130223