Author Details

Scroll

Refine your search

Collections

Engineering Collection

Co-Authors

Journals

Data Mining and Knowledge Engineering

Year

Authors

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All

Rosiline Jeetha, B.

A Comparative Study on Hierarchical Clustering in Data Mining

Abstract Views :408 | PDF Views:3

Authors

K. Kiruba ¹, B. Rosiline Jeetha ²

Affiliations
1 Department of Computer Science, RVS College of Arts & Science, Coimbatore, Tamil Nadu, IN
2 Department of Computer Applications (MCA), RVS College of Arts & Science, Coimbatore, Tamil Nadu, IN

Source

Data Mining and Knowledge Engineering, Vol 5, No 12 (2013), Pagination: 470-473

Abstract

Data mining is largely concerned with building models. Model is simply an algorithm or set of rules that connects a collection of data (input) to a particular target or outcome. Data mining involves the tasks are classification, estimation, prediction, clustering, affinity grouping, description & profiling. The first 3 are all the examples of directed data mining, where the goal is to find the value of a particular target variable. Affinity grouping and clustering are undirected tasks where the goal is to uncover structure in data without respect to particular target variable.
Profiling in a descriptive task that may be either directed or undirected. In this paper we will review the main methods and approaches of clustering. Clustering is the task of segmenting a heterogeneous population into a number of more homogeneous sub groups or clusters. This survey concentrated on data mining, data mining issues, clusters, clustering, clustering analysis, clustering algorithms, clustering issues, comparison of clustering algorithm, and Requirements of clustering in data mining.

Keywords

Data Mining, Clustering, Hierarchical Clustering Algorithm, Agglomerative, Divisive.

Full Text

An Optimized Approach to Record Deduplication

Abstract Views :181 | PDF Views:2

Authors

V. Nirmala ¹, B. Rosiline Jeetha ¹

Affiliations
1 Department of Computer Science, R.V.S College of Arts and Science, Sulur, Coimbatore, IN

Source

Data Mining and Knowledge Engineering, Vol 5, No 3 (2013), Pagination: 85-90

Abstract

Record deduplication is a specialized technique for eliminating duplicate copies of repeating record. Duplicate record detection is important for data preprocessing and cleaning. The increasing volume of information available in digital media becomes a challenging problem for data administrators. The increased volume even created redundant data also in the database. So a system or method is become immense to control the redundancy and duplication. Databases are increasing in size at an exponential rate, and it plays an important role in all industry. Detection of duplicate Records in IT industry become is necessary to obtain precise results while searching and to shrink storage requirements. This paper presents the problem of duplicate records and their detection. In the proposed approach, we made a method that makes use of BAT for generating the optimal similarity measure to decide whether the data is duplicate or not. The optimal similarity measure is generated using BAT algorithm for the training datasets. This system is initialized with a population of random solutions and searches for optima by updating bat generations We have used Synthetic datasets to analyze the proposed algorithm and the performance of the proposed algorithm is compared against the genetic programming technique with the help of evaluation metrics. Our Approach makes the user free from the burden of having to choose and tune this parameter.

Keywords

BAT Algorithm Data Preprocessing, Duplicate Detection, Data Duplication, Genetic Programming.

Full Text

An Emerging Classification Method for Huge Dataset in Clustering

Abstract Views :239 | PDF Views:2

Authors

B. Rosiline Jeetha ¹, M. Punithavalli ²

Affiliations
1 School of Computer Studies (PG), RVS College of Arts and Science, Coimbatore, IN
2 Department of Computer Science, SNS Raja Lakshmi College of Arts and Science, Coimbatore, IN

Source

Data Mining and Knowledge Engineering, Vol 3, No 10 (2011), Pagination: 599-601

Abstract

Clustering analysis is used to explore the classification for large dataset and Canberra distance is generalized so that it can process the data with categorical attributes. Based on the generalized Canberra distance definition, an instance of constraint-based clustering is introduced. Meanwhile, the nearest neighbor classification is improved. Class-labeled clusters are regarded as classifying models used for classifying data. The proposed classification method can discover the data of big difference from the instances in training data, which may mean a new data type. The generalize Canberra distance for continuous numerical attributes data to mixed attributes data, and use clustering analysis technique to squash existing instances, improve the classical nearest neighbor classification method.

Keywords

ID3, C4.5, Canberra Distance, Clustering, Improved Nearest Neighbour.

Full Text

A Survey on Classification Methods Based on Decision Tree Algorithms in Data Mining

Abstract Views :178 | PDF Views:1

Authors

B. Rosiline Jeetha ¹, M. Punithavalli ²

Affiliations
1 Bharathiar University, Coimbatore, IN
2 Department of Computer Science, SNS Raja Lakshmi College of Arts and Science, Coimbatore, IN

Source

Data Mining and Knowledge Engineering, Vol 3, No 4 (2011), Pagination: 207-210

Abstract

Data mining resides in the junction of traditional statistics and computer science. As distinct from statistics, data mining is more about searching for hypotheses in data that happens to be available instead of verifying research hypotheses by collecting data from designed experiments. Data mining is also characterized as being oriented toward problems with a large number of variables and/or samples that makes scaling up algorithms important. This means developing algorithms with low computational complexity, using parallel computing, partitioning the data into subsets, or finding effective ways to use relational data bases. The process- and utility-centered thinking in data mining and knowledge discovery is manifested also in the reported, commercial systems. Decision Trees are considered to be one of the most popular approaches for representing classifiers. Researchers from various disciplines such as statistics, machine learning, pattern recognition, and data mining considered the issue of growing a decision tree from available data. The technology for building Knowledge based system by decision tree algorithms has been demonstrated successfully in several practical applications. This paper summarizes an approach to synthesizing decision trees that has been used in variety of systems, and it describes such system ID3, C4.5 and CART. Results from recent studies show ways in which the methodology can be modified to deal with information that is noisy and/or incomplete.

Keywords

Decision Tree, ID3, C4.5 and CART.

Full Text

Data Mining Techniques on Social Media Drug Related Posts-A Comparative Study and Analysis

Abstract Views :222 | PDF Views:1

Authors

D. Krithika Renuka ¹, B. Rosiline Jeetha ²

Affiliations
1 Department of Computer Science (PG), PSGR Krishnammal College for Women, Coimbatore, IN
2 Department of Computer Science, Dr. N.G.P College of Arts and Science, Coimbatore, IN

Source

Data Mining and Knowledge Engineering, Vol 9, No 1 (2017), Pagination: 5-10

Abstract

Intelligently extracting knowledge from social media has recently attracted great interest from the Biomedical and Health Informatics community to simultaneously improve healthcare outcomes and reduce costs using consumer-generated opinion. Social media offers opportunities for patients and doctors to share their opinions and experiences freely in online communities, which may contribute information beyond the knowledge of domain professionals. However, for traditional public health surveillance systems, it is hard to detect and monitor health related concerns and changes in public attitudes to health-related issues. To solve this problem, Multiple studies illustrated the use of information in social media to discover biomedical and health-related knowledge. Several disease-specific information exchanges now exist on Face book and other online social networking sites. These new sources of knowledge, support, and engagement have become important for patients living with disease, yet the quality and content of the information provided in these digital areas are poorly understood. The existing research methodologies are discussed with their merits and demerits, so that the further research works can be concentrated more. The experimental tests conducted were on all the research works in matlab simulation environment and it is compared against each other to find the better approach under various performance measures such as Accuracy, Precision and Recall.

Keywords

Social Media, Health Related Issues, Sentiment Classifications and SOM.

Username
Password
Remember me