A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All
Vivekanandan, K.
- Analysis of Different Similarity Functions with Fuzzy C-Means Clustering Approach Using Meeting Transcripts
Authors
1 Department of Computer Science and Engineering, Pondicherry Engineering College, Puducherry-605014, IN
Source
Data Mining and Knowledge Engineering, Vol 6, No 7 (2014), Pagination: 311-315Abstract
Clustering is a technique of automatically grouping similar data into clusters. A large diversity of similarity measures distance functions such as Euclidean distance, Jaccard distance, Pearson Correlation distance, Cosine similarity and Kullback-Leibler Divergence have been implemented for clustering. Fuzzy C means algorithm is implemented for assigning membership to each word point in the cluster. In the same way it is calculated to each cluster center from the origin of remote region between the cluster center and the word point in this process. This proposed framework is used to validate the five similarity measure functions with Fuzzy C means clustering algorithm for finding the effectiveness. To estimate the optimal number of clusters, by implementing the validity measures like purity and entropy. Finally the results are compared five similarity measure functions with Fuzzy C Means clustering algorithm. Euclidean similarity measure function provides better and accurate results as compared to the other distance functions.Keywords
Clustering, Euclidean Distance, Fuzzy C Means Algorithm, Similarity Measure.- Efficient Keyword Based Document Clustering Using Fuzzy C-Means Algorithm
Authors
1 Erode Arts and Science College, Erode-638009, Tamil Nadu, IN
Source
Data Mining and Knowledge Engineering, Vol 5, No 12 (2013), Pagination: 460-463Abstract
Clustering is an useful technique in the field of textual data mining. Cluster analysis divides objects into meaningful groups based on similarity between objects. The existing clustering approaches face the issues like practical applicability, very less accuracy, more classification time etc. In recent times, inclusion of fuzzy logic in clustering results in better clustering results. In order to further improve the performance of clustering, the Fuzzy C-Means (FCMA) Algorithm is used. The keywords are extracted from the documents using LSA based document extraction. The Fuzzy partition matrix is created for the clustering process and the performance of the document clustering is greater based on the keyword when compared to the Existing K-Means Clustering and EM Algorithm. The proposed technique will be highly useful in the text mining process to increase the accuracy and performance of the text extraction process.Keywords
Document Clustering, Fuzzy Cluster, Fuzzy C-Means, K-Means Clustering.- An Ontology Mapping Maintenance Approach Using Change History Log in Ontology Based Data Integration
Authors
1 Department of Computer Science and Engineering, Pondicherry Engineering College, Puducherry, IN
2 Department of Computer Science and Engineering, Sri Manakula Vinayagar Engineering College, Puducherry, IN
Source
Data Mining and Knowledge Engineering, Vol 5, No 7 (2013), Pagination: 292-300Abstract
Data integration systems aim at integrating data from multiple heterogeneous, distributed and autonomous systems to provide a uniform access interface to end users. Today, ontologies are finding their way into a wide variety of applications including data integration. When using various ontologies to integrate data, mappings have to be produced. Mapping between two ontologies is used to achieve interoperability and to share information in an efficient manner. The requirement of updating the recent advances in a particular ontology leads to ontology evolution. Due to ontology evolution, the existing mapping between the ontologies become unreliable, invalid and outdated. In this paper we propose an ontology mapping maintenance approach using Change History Log (CHL). Our approach computes matching between the changed entities and adapts the existing mappings accordingly. It also ensures that the application of ontology changes should result in a mapping document conforming to the set of ontology mapping consistency constraints. The proposed approach can reduce the time needed for regenerating mappings for each time changes occurring in the ontology.Keywords
Change History Log, Knowledge Sharing, Mapping Technique, Ontology Evolution, Ontology Mapping.- Automatic Selection of Decision Tree Algorithm Based on Training Set Size
Authors
1 Bharathiar University, School of BSMED, Coimbatore, IN
2 SNS Rajalakshmi College of Science, Coimbatore, IN
3 Dr. G.R. Damodaran College of Science, Coimbatore, IN
Source
Data Mining and Knowledge Engineering, Vol 2, No 2 (2010), Pagination: 1-9Abstract
In Data mining applications, very large training data sets with several million records are common. Decision trees are powerful and popular technique for both classification and prediction. Many decision tree construction algorithms have been proposed to handle large or small training data sets. Some algorithms are best suited for large data sets and some for small data set. The decision tree algorithm C4.5 classifies categorical and continuous attributes very well but it handles efficiently only a smaller data set. SLIQ (Supervised Learning In Quest) and SPRINT (Scalable Parallelizable Induction of Decision Tree)handles very large datasets. This paper deals with the automatic selection of decision tree algorithm based on training set size. The proposed system first prepares the training dataset size using the mathematical measure. The resultant training set size will be checked with the available memory. If memory is sufficient then the tree construction will continue with any one of the algorithms C4.5, SLIQ, SPRINT. After classifying the dataset, the accuracy of the classifier is estimated. The major advantages of the proposed approach are that the system takes less time and avoids memory problem.