Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

High Dimensional Data Mining Using Clustering


Affiliations
1 Anna University, Coimbatore, India
2 Bannari Amman Institute of Technology, Sathyamangalam, India
     

   Subscribe/Renew Journal


Clustering is one of the major tasks in data mining Clustering algorithms are based on a criterion that maximizes inter cluster distance and minimize intra cluster distance. In higher dimensional feature spaces, the performance and efficiency deteriorates to a greater extent. Large dimensions confuse the clustering algorithms and it is difficult to group similar data points becomes almost the same and is usually called as the “dimensionality curse” problem. These algorithms find a subset of dimensions by removing irrelevant and redundant dimensions on which clustering is performed. Dimensionality reduction technique such as Principal Component Analysis (PCA) is used for feature reduction. If different subsets of the points cluster well on different subspaces of the feature space, a global dimensionality reduction will fail. To overcome these problems, recent directions in research proposed to compute subspace cluster. The algorithms have two common limitations. First, they usually have problems with subspace clusters of different dimensionality. Second, they often fail to discover clusters of different shape and dimensionalities. The goal of this project is to develop new efficient and effective methods for high dimensional clustering.

Keywords

Data Mining, High Dimensional Clustering, Distance Measure.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 161

PDF Views: 2




  • High Dimensional Data Mining Using Clustering

Abstract Views: 161  |  PDF Views: 2

Authors

A. Bharathi
Anna University, Coimbatore, India
A. M. Natarajan
Bannari Amman Institute of Technology, Sathyamangalam, India

Abstract


Clustering is one of the major tasks in data mining Clustering algorithms are based on a criterion that maximizes inter cluster distance and minimize intra cluster distance. In higher dimensional feature spaces, the performance and efficiency deteriorates to a greater extent. Large dimensions confuse the clustering algorithms and it is difficult to group similar data points becomes almost the same and is usually called as the “dimensionality curse” problem. These algorithms find a subset of dimensions by removing irrelevant and redundant dimensions on which clustering is performed. Dimensionality reduction technique such as Principal Component Analysis (PCA) is used for feature reduction. If different subsets of the points cluster well on different subspaces of the feature space, a global dimensionality reduction will fail. To overcome these problems, recent directions in research proposed to compute subspace cluster. The algorithms have two common limitations. First, they usually have problems with subspace clusters of different dimensionality. Second, they often fail to discover clusters of different shape and dimensionalities. The goal of this project is to develop new efficient and effective methods for high dimensional clustering.

Keywords


Data Mining, High Dimensional Clustering, Distance Measure.