Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Root Mapping based Neighbour Clustering in High-Dimensional Data


Affiliations
1 Department of Computer Science, Sri Ramakrishna College of Arts and Science for Women, India
2 School of Computing, Sri Ramakrishna College of Arts and Science, India
     

   Subscribe/Renew Journal


High-dimensional data arise naturally in a lot of domains, and have regularly presented a great confront for usual data mining techniques. This work, take a novel perspective on the problem of data points (data in the orientation of contain points) in clustering large-dimensional data. The planned methodology known as ischolar_main mappings and neighbor clustering, that takes as input measures of correspondence between pairs of information points. Real-valued data points are exchanged between data points until a high-quality set of patterns and corresponding clusters gradually emerges. To validate our theory by demonstrating that data points is a high-quality measure of point centrality within a high-dimensional information cluster, and by proposing several clustering algorithms, showing that main data points can be used effectively as cluster prototypes or as guides during the search for centroid-based cluster patterns. Experimental results demonstrate the good performance of our proposed algorithms in manifold settings, mainly focused on large quantities of overlapping noise. The proposed methods are modified mostly for detecting approximately hyper spherical clusters and need to be extended to properly handle clusters of arbitrary shapes.

Keywords

Clustering, High-Dimensional, Nearest Neighbours, Data Points, Root Mapping.
Subscription Login to verify subscription
User
Notifications
Font Size

  • J. Han, M. Kamber and J. Pei, “Data Mining: Concepts and Techniques”, 2nd Edition, Morgan Kaufmann, 2006.
  • C.C. Aggarwal and P.S. Yu, “Finding Generalized Projected Clusters in High Dimensional Spaces”, Proceedings of 26th ACM International Conference on Management of Data, pp. 70-81, 2000.
  • K. Kailing, H.P. Kriegel, P. Kroger and S. Wanka, “Ranking Interesting Subspaces for Clustering High Dimensional Data”, Proceedings of 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 241-252, 2003.
  • K. Kailing, H.P. Kriegel and P. Kroger, “Density-Connected Subspace Clustering for High-Dimensional Data”, Proceedings of 4th SIAM International Conference on Data Mining, pp. 246-257, 2004.
  • E. Muller, S. Gunnemann, I. Assent and T. Seidl, “Evaluating Clustering in Subspace Projections of High Dimensional Data”, Proceedings of International Conference on Very Large Data Base Endowment, Vol. 2, pp. 1270-1281, 2009.
  • E. Agirre, D. Martinez, O.L. De Lacalle and A. Soroa, “Two Graph-Based Algorithms for State-of-the-Art WSD”, Proceedings of International Conference on Empirical Methods in Natural Language Processing, pp. 585-593, 2006.
  • D. Arthur and S. Vassilvitskii, “K-Means++: The Advantages of Careful Seeding”, Proceedings of International Conference on ACM-SIAM SIAM International Conference on Discrete Algorithms, pp. 1027-1035, 2007.
  • E. Bicici and D. Yuret, “Locally Scaled Density Based Clustering”, Proceedings of International Conference on Adaptive and Natural Computing Algorithms, pp. 739-748, 2007.
  • S. Hader and F.A. Hamprecht, “Efficient Density Clustering using Basin Spanning Trees”, Proceedings of International Conference on Data Science and Applied Data Analysis, pp. 39-48, 2003.
  • M. Radovanovic, A. Nanopoulos and M. Ivanovic, “Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data”, Journal of Machine Learning Research, Vol. 11, pp. 2487-2531, 2010.
  • G. Frederix and E.J. Pauwels, “Shape-Invariant Cluster Validity Indices”, Proceedings of 4th International Conference on Data Mining, pp. 96-105, 2004.
  • Y. He, H. Tan, W. Luo, H. Mao, S. Feng and J. Fan, “MR-DBSCAN: An Efficient Parallel Density-based Clustering Algorithm using MapReduce”, Proceedings of International Conference on Parallel and Distributed Systems, pp. 473-480, 2011.
  • C. Cassisi, A. Ferro, R. Giugno, G. Pigola and A. Pulvirenti, “Enhancing Density-Based Clustering: Parameter Reduction and Outlier Detection”, Information Systems, Vol. 38, No. 3, pp. 317-330, 2013.
  • D. Moulavi, P. A Jaskowiak, R.J.G. B. Campello, A. Zimek and J. Sander, “Density-Based Clustering Validation”, Proceedings of 4th SIAM International Conference on Data Mining, pp. 839-847, 2014.
  • R. Guidotti, R. Trasarti and M. Nanni, “Tosca: Two-Steps Clustering Algorithm for Personal Locations Detection”, Proceedings of International Conference on Advances Geographic Information Systems, 18-38, 2015.
  • Y. Lv, T. Ma, M. Tang, J. Cao, Y. Tian, A.A. Dhelaan and M.Z. Rodhaan, “An Efficient and Scalable Density-based Clustering Algorithm for Datasets with Complex Structures”, Neurocomputing, Vol. 171, pp. 9-22, 2016.
  • J. Gan and Y. Tao, “On the Hardness and Approximation of Euclidean DBSCAN”, ACM Transactions on Database Systems, Vol. 42, No. 3, pp.1-14, 2017.
  • Avory Bryant and Krzysztof Cios, “RNN-DBSCAN: A Density-Based Clustering Algorithm Using Reverse Nearest Neighbor Density Estimates”, IEEE Transactions on Knowledge And Data Engineering, Vol. 30, No. 6, pp. 1109-1121, 2018.

Abstract Views: 328

PDF Views: 0




  • Root Mapping based Neighbour Clustering in High-Dimensional Data

Abstract Views: 328  |  PDF Views: 0

Authors

M. D. Dithy
Department of Computer Science, Sri Ramakrishna College of Arts and Science for Women, India
V. KrishnaPriya
School of Computing, Sri Ramakrishna College of Arts and Science, India

Abstract


High-dimensional data arise naturally in a lot of domains, and have regularly presented a great confront for usual data mining techniques. This work, take a novel perspective on the problem of data points (data in the orientation of contain points) in clustering large-dimensional data. The planned methodology known as ischolar_main mappings and neighbor clustering, that takes as input measures of correspondence between pairs of information points. Real-valued data points are exchanged between data points until a high-quality set of patterns and corresponding clusters gradually emerges. To validate our theory by demonstrating that data points is a high-quality measure of point centrality within a high-dimensional information cluster, and by proposing several clustering algorithms, showing that main data points can be used effectively as cluster prototypes or as guides during the search for centroid-based cluster patterns. Experimental results demonstrate the good performance of our proposed algorithms in manifold settings, mainly focused on large quantities of overlapping noise. The proposed methods are modified mostly for detecting approximately hyper spherical clusters and need to be extended to properly handle clusters of arbitrary shapes.

Keywords


Clustering, High-Dimensional, Nearest Neighbours, Data Points, Root Mapping.

References