Sampling is an important and necessary step in mining large size databases and is also very useful in performing mining operations, where performance is a critical issue. This study focuses on identifying the effect of sample size in classification of software bugs. To analyze the effect of sample size, experiments are performed using a number of classification algorithms with varities of sample sizes using the software bug repositories of three large open source software's namely Android, Mozilla and MySql. The relationship between the sample size with two primary classification performance parameters accuracy and F-measure is explored in this study. From experiments, it is identified that the parameter F-measure is affected more by the sample size than accuracy.
Keywords
Sampling, Sample Size, Classification, Software Bug, Performance, Classifier Evaluation
User
Information