A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All
Punithavalli, M.
- An Emerging Classification Method for Huge Dataset in Clustering
Authors
1 School of Computer Studies (PG), RVS College of Arts and Science, Coimbatore, IN
2 Department of Computer Science, SNS Raja Lakshmi College of Arts and Science, Coimbatore, IN
Source
Data Mining and Knowledge Engineering, Vol 3, No 10 (2011), Pagination: 599-601Abstract
Clustering analysis is used to explore the classification for large dataset and Canberra distance is generalized so that it can process the data with categorical attributes. Based on the generalized Canberra distance definition, an instance of constraint-based clustering is introduced. Meanwhile, the nearest neighbor classification is improved. Class-labeled clusters are regarded as classifying models used for classifying data. The proposed classification method can discover the data of big difference from the instances in training data, which may mean a new data type. The generalize Canberra distance for continuous numerical attributes data to mixed attributes data, and use clustering analysis technique to squash existing instances, improve the classical nearest neighbor classification method.Keywords
ID3, C4.5, Canberra Distance, Clustering, Improved Nearest Neighbour.- A Survey on Classification Methods Based on Decision Tree Algorithms in Data Mining
Authors
1 Bharathiar University, Coimbatore, IN
2 Department of Computer Science, SNS Raja Lakshmi College of Arts and Science, Coimbatore, IN
Source
Data Mining and Knowledge Engineering, Vol 3, No 4 (2011), Pagination: 207-210Abstract
Data mining resides in the junction of traditional statistics and computer science. As distinct from statistics, data mining is more about searching for hypotheses in data that happens to be available instead of verifying research hypotheses by collecting data from designed experiments. Data mining is also characterized as being oriented toward problems with a large number of variables and/or samples that makes scaling up algorithms important. This means developing algorithms with low computational complexity, using parallel computing, partitioning the data into subsets, or finding effective ways to use relational data bases. The process- and utility-centered thinking in data mining and knowledge discovery is manifested also in the reported, commercial systems. Decision Trees are considered to be one of the most popular approaches for representing classifiers. Researchers from various disciplines such as statistics, machine learning, pattern recognition, and data mining considered the issue of growing a decision tree from available data. The technology for building Knowledge based system by decision tree algorithms has been demonstrated successfully in several practical applications. This paper summarizes an approach to synthesizing decision trees that has been used in variety of systems, and it describes such system ID3, C4.5 and CART. Results from recent studies show ways in which the methodology can be modified to deal with information that is noisy and/or incomplete.Keywords
Decision Tree, ID3, C4.5 and CART.- An Enhanced Projected Clustering Algorithm for High Dimensional Space
Authors
1 Department of Computer Science, Sri Ramakrishna College of Arts and Science for Women, Coimbatore, IN
2 Department of Computer Science Dr.SNS College of Arts and Science, Coimbatore, IN
3 Department of Computer Science and Engineering, Park College of Engineering & Technology, Coimbatore, IN
Source
Data Mining and Knowledge Engineering, Vol 3, No 2 (2011), Pagination: 104-109Abstract
Clustering is a data mining technique for identifying groups in the data set based on some similarity measure. Clustering high dimensional data has been a major challenge due to the inherent sparsity of the points. Most existing clustering algorithms become substantially inefficient if the required similarity measure is computed between data points in the full dimensional space. A number of projected clustering algorithms have been proposed to overcome the above issue. This led to the development of a robust partitional distance based projected clustering algorithm based on K-means algorithm with the computation of distance restricted to subsets of attributes with dense object values. The algorithm is capable of detecting projected clusters of low dimensionality embedded in a high-dimensional space and avoids the computation of the distance in full-dimensional space. The algorithm has been demonstrated using synthetic and real datasets.Keywords
Clustering, High Dimensional Data, Projected Cluster, K-Means Clustering, Subspace Clustering.- A Survey on Clustering Algorithms
Authors
1 Department of Computer Applications, Sri Ramakrishna Institute of Technology, Coimbatore, IN
2 Department of Computer Science, Sri Ramakrishna Arts College for Women, Coimbatore, IN
Source
Data Mining and Knowledge Engineering, Vol 2, No 2 (2010), Pagination: 28-32Abstract
Clustering is a widely used technique to find interesting patterns dwelling in the dataset that remain unknown. In general, clustering is a method of dividing the data into groups of similar objects. One of significant research areas in data mining is to develop methods to modernize knowledge by using the existing knowledge, since it can generally augment mining efficiency,especially for very bulky database. Data mining uncovers hidden,previously unknown, and potentially useful information from large amounts of data. This paper presents a general survey of various clustering algorithms. In addition, the paper also describes the efficiency of Self-Organized Map (SOM) algorithm in enhancing the mixed data clustering.
Keywords
Data Clustering, Data Mining, Mixed Data Clustering, Self-Organized Map Algorithm.- A Survey on Data Clustering Algorithms
Authors
1 Department of Computer Science, Erode Arts & Science College, Erode, Tamil Nadu, IN
2 Department of Computer Science, Sri Ramakrishna College of Arts and Science for Women, Coimbatore, IN
Source
Data Mining and Knowledge Engineering, Vol 1, No 8 (2009), Pagination: 421-425Abstract
Clustering is a significant area of application for a range of fields including data mining, statistical data analysis, image compression, and vector quantization. Moreover Clustering has been formulated in different manners in machine learning, pattern recognition, optimization, and statistics literature. The basic problem in clustering arise at grouping together (clustering) data streams which are analogous to each other. A variety of algorithms have emerged that meet the requirements and were successfully applied to real-life data clustering problems. This paper makes a general survey on various Clustering algorithms that have been proposed earlier in literature. In addition the future enhancement section of this paper suggests some of the modifications of earlier proposed work to overcome their limitations.Keywords
Clustering, Data Mining, Image Compression, Machine Learning, Optimization, Pattern Recognition, Statistical Data Analysis, Vector Quantization.- Software Tool for Agent Based Distributed Data Mining
Authors
1 Computer Applications Department, Dr. SNS Rajalakshmi College of Arts and Science, Coimbatore, IN
2 Computer Science Department, Sri Ramakrishna College of Arts and Science for Women, Coimbatore, IN