Ensemble Approaches for Class Imbalance Problem:A Review

Anjana Gosain; Arushi Gupta

Ensemble Approaches for Class Imbalance Problem:A Review

Anjana Gosain , Arushi Gupta

Affiliations
1 Department of Information Technology, USICT, GGSIP University, Dwarka, Delhi, India

Subscribe/Renew Journal

Abstract
References
Article Metrics
Refbacks

In data mining, performing classification for skewed data distribution is a challenging problem. Traditional Classification Techniques (TCT) work efficiently in classifying data having symmetric distribution, as their internal design favors the balanced datasets. The Class Imbalance Problem (CIP) take place when number of instances of one class outnumbers instances of other classes. Some factors that contribute towards this imbalancing are noisy data, borderline samples, degree of class overlapping, small disjuncts, etc. In machine learning, ensembles are basically built to improve the performance and correctness of single classifier by training multiple classifiers to form the results that output the correct single class label. In this paper, our aim is to review ensemble learning methods having two-class problem. We propose different levels for ensemble learning methods that are at data level, at algorithm level and according to the base classifier.

Keywords

Bagging, Boosting, Classification, Class Imbalance Problem, Oversampling, Skewed Data Distribution, Undersampling.

I-Scholar

Journal Help

User

Subscription Login to verify subscription

Notifications

Journal Content
Browse

Font Size

Information

A. Gosain, and S. Sardana, “Handling class imbalance problem using oversampling techniques: A review,” 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 79-85, 2017.

J. Han, M. Kamber, and J. Pei, Data Mining Concepts and Techniques, 3rd ed., Morgan Kaufmann Publishers, 2011.

P.-N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining, Pearson Education Pvt. Ltd., 2013.

Y. Sun, M. S. Kamel, A. K. Wong, and Y. Wang, “Cost-sensitive boosting for classification of imbalanced data,” Pattern Recognition, vol. 40, no. 12, pp. 3358-3378, 2017.

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321-357, 2002.

C. Seiffert, T. M. Khoshgoftaar, J. V. Hulse, and A. Napolitano, “RUSBoost: A hybrid approach to alleviating class imbalance,” IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems & Humans, vol. 40, no. 1, pp. 185-197, 2010.

Y. Freund, and R. E. Schapire, “Experiments with a new boosting algorithms,” in Proceedings of the 13^th International Conference on Machine Learning (ICML’96), pp. 148-156, 1996.

M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera, “A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches,” IEEE Transactions on Systems, Man, and Cybernetics - Part C: Applications and Reviews, vol. 42, no. 4, pp. 463-484, 2012.

J. Stefanowski, and S. Wilk, “Selective pre-processing of imbalanced data for improving classification performance,” in International Conference on Data Warehousing and Knowledge Discovery, Springer, Berlin, Heidelberg, pp. 283-292, 2008.

A. Gosain, A. Saha, and D. Singh, “Analysis of sampling based classification techniques to overcome class imbalancing,” in 3^RD International Conference on Computing for Sustainable Global Development (INDIACom), pp. 2637-2643, 2016.

S. Fattahi, Z. Othman, and Z. A. Othman, “New approach with ensemble method to address class imbalance problem,” Journal of Theoretical & Applied Information Technology, vol. 72, no. 1, pp. 23-33, 2015.

N. V. Chawla, A. Lazarevic, L. O. Hall, and K. Bowyer, “SMOTEBoost: Improving prediction of the minority class in boosting,” in 7^th European Conference on Principles and Practice of Knowledge Discovery in Databases, Cavtat Dubrovnik, Croatia, pp. 107-119, 2003.

X. Y. Liu, J. Wu, and Z. H. Zhou, “Exploratory undersampling for class-imbalance learning,” IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics, vol. 39, no. 2, pp. 539-550, 2009.

R. Barandela, R. M. Valdovinos, and J. S. Sánchez, “New applications of ensembles of classifiers,” Pattern Analysis & Applications, vol. 6, no. 3, pp. 245-256, 2003.

L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, no. 2, pp. 123-140, 1996.

S. Wang, and X. Yao, “Diversity analysis on imbalanced data sets by using ensemble models,” IEEE Symposium on Computational Intelligence and Data Mining, pp. 324-331, 2009.

L. Breiman, “Pasting small votes for classification in large databases and on-line,” Machine Learning, vol. 36, no. 1-2, pp. 85-103, 1999.

S. Hu, Y. Liang, L. Ma, and Y. He, “MSMOTE: Improving classification performance when training data is imbalanced,” Second International Workshop on Computer Science and Engineering, vol. 2, pp. 13-17, 2009.

K. M. Ting, “An instance-weighting method to induce cost sensitive trees,” IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 3, pp. 659-665, 2002.

Y. Yang, and G. Ma, “Ensemble based active learning for class imbalance problem,” Journal of Biomedical Science and Engineering, vol. 3, no. 10, pp. 1021-1028, 2010.

J. J. Rodriguez, L. I. Kuncheva, and C. J. Alonso, “Rotation forest: A new classifier ensemble method,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 10, pp. 1619-1630, 2006.

W. Fan, S. J. Stolfo, J. Zhang, and P. K. Chan, “AdaCost: Misclassification cost-sensitive boosting,” in Proceedings of the Sixteenth International Conference on Machine Learning (ICML’99), pp. 97-105, 1999.

H. Guo, and H. L. Viktor, “Learning from imbalanced data sets with boosting and data generation: The databoost-im approach,” ACM Sigkdd Explorations Newsletter, vol. 6, no. 1, pp. 30-39, 2004.

B. X. Wang, and N. Japkowicz, “Boosting support vector machines for imbalanced data-sets,” Knowledge Information Systems, vol. 25, no. 1, pp. 1-20, 2010.

M. Galar, A. Fernández, E. Barrenechea, and F. Herrera, “EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling,” Pattern Recognition, vol. 46, no. 12, pp. 3460-3471, 2013.

P. Domingos, “Metacost: A general method for making classifiers cost-sensitive,” in Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 155-164, 1999.

Abstract Views: 548

PDF Views: 0

International Journal of Research in Signal Processing, Computing & Communication System Design

Ensemble Approaches for Class Imbalance Problem:A Review

Subscribe/Renew Journal

Keywords

Ensemble Approaches for Class Imbalance Problem:A Review

Authors

Abstract

Keywords

References

Username
Password
Remember me

Username
Password
Remember me