Comparative Study of Algorithms on Class Imbalanced Datasets

R. Buli Babu; Mohammed Ali Hussain; R. B. Babu

doi:10.17485/ijst/2016/v9i18/132953

Comparative Study of Algorithms on Class Imbalanced Datasets

R. Buli Babu ¹, Mohammed Ali Hussain ², R. B. Babu ²

Affiliations
1 Department of Computer Science, Bharathair University, Coimbatore, India
2 Department of Electronics and Computer Engineering, India

Abstract
References
Article Metrics
Refbacks

Objective: The main motto of this work is to track the financial defaulter list from the class imbalanced datasets, we have also identified the extent of defaulter in loan using power method. Method: So, the techniques used to find the defaulters for the class imbalance are K-Nearest Neighbor, Logistic Regression (LR), GB and neural methods. Our anabasis is done on financial class imbalanced datasets to identify the worst defaulter using classification methods. In the datasets we come across majority and minority classes in a datasets. The datasets are applied to various classification methods for finding or predicting the defaulters and observe the variance occurred in fault default of a loan. Findings: We have taken 6 real word datasets from various banks or loan lenders information, these datasets are randomly under sampled to find the lower class of loan defaulters, we can also identify the extent of defaulter of loan by prediction of power and which can be advisable. The effect of measurement is done using performance measure using AUC, we also used statically and post ahoc test to find the significance of AUC too. Applications: Output of the study is notified with boosting gradient performance, which copes with the class imbalance comparative results. We also show that when large balanced class datasets are used, KNN, decision-tree and quadratic discrimination will lead to bad performance. The results show that LR and LDA gives the best appropriate selection in finding the good and worst customer prediction.