Refine your search
Collections
Journals
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All
Nedunchezhian, R.
- A Fast Boosting based Incremental Genetic Algorithm for Mining Classification Rules in Large Datasets
Abstract Views :333 |
PDF Views:2
Authors
Affiliations
1 Department of CSE, Park College of Engineering and Technology, Coimbatore, IN
2 Department of CSE, Kalaignar Karunanidhi Institute of Technology, Coimbatore, IN
1 Department of CSE, Park College of Engineering and Technology, Coimbatore, IN
2 Department of CSE, Kalaignar Karunanidhi Institute of Technology, Coimbatore, IN
Source
Software Engineering, Vol 6, No 5 (2014), Pagination: 137-141Abstract
Genetic algorithm is a search technique purely based on natural evolution process. It is widely used by the data mining community for classification rule discovery in complex domains. During the learning process it makes several passes over the data set for determining the accuracy of the potential rules. Due to this characteristic it becomes an extremely I/O intensive slow process. It is particularly difficult to apply GA when the training data set becomes too large and not fully available. An incremental Genetic algorithm based on boosting phenomenon is proposed in this paper which constructs a weak ensemble of classifiers in a fast incremental manner and thus tries to reduce the learning cost considerably.Keywords
Classification, Incremental Learning, Genetic Algorithm (Ga), Scalability, Boosting.- An Alternative Extension of the K-Means Algorithm for Clustering Medical Data
Abstract Views :251 |
PDF Views:2
Most of the earlier work on clustering has mainly been focused on numerical data whose inherent geometric properties can be exploited to naturally define distance functions between data points. Recently, the problem of clustering categorical data has started drawing interest. However, the computational cost makes most of the previous algorithms unacceptable for clustering very large databases. The k-means algorithm is well known for its efficiency in this respect. At the same time, working only on numerical data prohibits them from being used for clustering categorical data. The main contribution of this is to show how to apply the notion of “cluster centers” on a dataset of categorical objects and how to use this notion for formulating the clustering problem of categorical objects as a partitioning problem. Finally, a k-means-like algorithm for clustering categorical data is introduced. The clustering performance of the algorithm is demonstrated with well-known medicine data sets.
Authors
Affiliations
1 Department of Computer Science and Engineering, Kalaignar Karunanidhi Institute of Technology, Coimbatore, IN
2 Department of Master of Computer Applications, PSG College of Arts and Science, Coimbatore, IN
1 Department of Computer Science and Engineering, Kalaignar Karunanidhi Institute of Technology, Coimbatore, IN
2 Department of Master of Computer Applications, PSG College of Arts and Science, Coimbatore, IN
Source
Data Mining and Knowledge Engineering, Vol 1, No 8 (2009), Pagination: 375-382Abstract
Data clustering is a very powerful technique in many application areas. Not only may the clusters have meaning themselves, but clustering allows for efficient data management techniques in that data that is grouped in the same manner will usually be accessed together. Access to data within a cluster may predict that other data in that cluster will be accessed soon; this can lead to optimized storage strategies which perform much better than if the data were randomly stored.Most of the earlier work on clustering has mainly been focused on numerical data whose inherent geometric properties can be exploited to naturally define distance functions between data points. Recently, the problem of clustering categorical data has started drawing interest. However, the computational cost makes most of the previous algorithms unacceptable for clustering very large databases. The k-means algorithm is well known for its efficiency in this respect. At the same time, working only on numerical data prohibits them from being used for clustering categorical data. The main contribution of this is to show how to apply the notion of “cluster centers” on a dataset of categorical objects and how to use this notion for formulating the clustering problem of categorical objects as a partitioning problem. Finally, a k-means-like algorithm for clustering categorical data is introduced. The clustering performance of the algorithm is demonstrated with well-known medicine data sets.
Keywords
Clustering, K-Mean Clustering, Proximity.- A Improved Incremental and Interactive Frequent Pattern Mining Techniques for Market Basket Analysis and Fraud Detection in Distributed and Parallel Systems
Abstract Views :314 |
PDF Views:0
Authors
Affiliations
1 Department of Information Technology, Toc H Institute of Science and Technology, Ernakulam - 682313, Kerala, IN
2 Department of Computer Science and Engineering, Sri Ranganathar Institute of Engineering and Technology, Coimbatore - 641110, Tamilnadu, IN
1 Department of Information Technology, Toc H Institute of Science and Technology, Ernakulam - 682313, Kerala, IN
2 Department of Computer Science and Engineering, Sri Ranganathar Institute of Engineering and Technology, Coimbatore - 641110, Tamilnadu, IN
Source
Indian Journal of Science and Technology, Vol 8, No 18 (2015), Pagination:Abstract
Objectives: To develop a memory efficient, incremental and interactive distributed FPM having less communication and synchronization overhead with good load balancing capability, to analyze the dynamic transactional data in a distributed database. Methods/Analysis: This technique adopts prefix based equivalence class partitioning scheme to generate frequent item sets without generating local frequent sets with low memory consumption. This approach uses a range of support values to update the frequent patterns with less time complexity. This paper proposes distributed FPM techniques with both count distributed and compressed data distributed parallel approaches. The performance of the algorithms are tested and compared with popular distributed FPM algorithms using standard datasets. Findings: To deal with the massive dynamic data stored in distributed databases, this approach develops three distributed frequent set generation algorithms, which update frequent patterns by reusing the previously stored pattern information with no complex calculations or data structures. The proposed approaches also provide the user with the facility to interactively adjust the minimum support value as per their own conveniences by keeping the nearly frequent itemsets with the help of two minimum support thresholds (low, high). Measures have been taken to reduce the additional itemset storage and computations as well as to achieve good load balancing with low communication and synchronization overhead. Since the proposed algorithms adopt prefix based equivalent class partitioning technique at each n-itemset level and undergo four levels of itemset filtering to remove infrequent items from each class before calculating the individual item count, the inter node communication required is less in this approach. To eliminate the drawbacks of both count and data distribution approaches one of the algorithms proposed adopts a hybrid approach which distributes the compressed data only once, hence communication overhead is less compared with other DD algorithms. Conclusion/Application: The proposed distributed techniques reduce memory utilization and itemset comparisons compared to the existing approaches. The performances are tested and evaluated for market analysis and online credit card fraud detection applications.Keywords
Credit Card Fraud Detection System, Incremental Distributed Frequent Pattern Mining, Interactive Parallel Mining Techniques, Market Basket Analysis, Prefix Based Equivalence Class Partitioning Approach- A Novel Fuzzy Logic Model to Identify Closeness for Alias Detection
Abstract Views :338 |
PDF Views:0
Authors
Affiliations
1 Department of Computer Applications, PSG College of Technology, Bharathiar University, Coimbatore – 641004, Tamil Nadu, India, IN
2 KIT-Kalaignarkarunanidhi Institute of Technology, Coimbatore – 641402, Tamil Nadu, IN
1 Department of Computer Applications, PSG College of Technology, Bharathiar University, Coimbatore – 641004, Tamil Nadu, India, IN
2 KIT-Kalaignarkarunanidhi Institute of Technology, Coimbatore – 641402, Tamil Nadu, IN
Source
Indian Journal of Science and Technology, Vol 8, No 28 (2015), Pagination:Abstract
The detection of person alias names are important for improving the accuracy of the information quality. The fuzzy-based decision system is proposed for alias detection, which is a rule-based system that uses fuzzy logic to make a decision about the closeness between the given name pairs. A fuzzy logic is formulated by a set of linguistic variables based on feature’s score value. An entity pair’s association score values are calculated using string and link-based features like Hamming Distance, Leventein Distance, Normalized String Edit Distance, Common Friends, Normalized Dot Product and Co-occurrence Relevance and an output variable accuracy as closeness. These features are transformed into fuzzy input variables and designed with proper membership functions. The proposed novel fuzzy inference system gives the decision of aliases closeness in the form of crisp values ranging from 0 to 1. In this work, the model achieves upto 90% accuracy compared to estimated accuracy.Keywords
Alias Detection, Closeness Identification, Fuzzy Logic, String and Link Based Feature- An Improved Alias Classification using Logistic Regression with Particle Swarm Optimization
Abstract Views :249 |
PDF Views:0
Authors
Affiliations
1 Department of Computer Applications, PSG College of Technology, Bharathiar University, Coimbatore – 641004, Tamil Nadu, IN
2 KIT-Kalaignar Karunanidhi Institute of Technology, Coimbatore – 641402, Tamil Nadu, IN
1 Department of Computer Applications, PSG College of Technology, Bharathiar University, Coimbatore – 641004, Tamil Nadu, IN
2 KIT-Kalaignar Karunanidhi Institute of Technology, Coimbatore – 641402, Tamil Nadu, IN
Source
Indian Journal of Science and Technology, Vol 8, No 28 (2015), Pagination:Abstract
An improvement in detection of alias names of an entity is an important factor in many cases like terrorist and criminal network. In this paper, the social network properties are used to construct a feature set for classification. The proposed particle swarm optimization is used to optimize the regularization parameter of the logistic regression and improve the accuracy of the entity alias classification significantly to 4.98% compared to that of the logistic regression. The experimental results demonstrated its performance and the results are compared to the logistic regression with alias Detection Dataset from Auton Lab.Keywords
Alias Classification, Logistic Regression, Particle Swarm Optimization Regularization- An Optimized Approach on Link Stability with Load Balancing in MANET using Balanced Reliable Shortest Route AOMDV (BRSR_AOMDV)
Abstract Views :315 |
PDF Views:0
Authors
Affiliations
1 Bharathiar University, Coimbatore - 641046, Tamil Nadu, IN
2 Kalaignar Karunanidhi Institute of Technology, Coimbatore - 641402, Tamil Nadu, IN
1 Bharathiar University, Coimbatore - 641046, Tamil Nadu, IN
2 Kalaignar Karunanidhi Institute of Technology, Coimbatore - 641402, Tamil Nadu, IN