S. Bibi, G. Tsoumakas, I. Stamelos, I. Vlahavas, “Software Defect Prediction Using Regression via Classification”, Proc. 4th ACS/IEEE International Conference on Computer Systems and Applications, AICCSA '06, (accepted for presentation), pp. 330- 336, 2006.
In this paper we apply a machine learning approach to the problem of estimating the number of defects called Regression via Classification (RvC). RvC initially automatically discretizes the number of defects into a number of fault classes, then learns a model that predicts the fault class of a software system. Finally, RvC transforms the class output of the model back into a numeric prediction. This approach includes uncertainty in the models because apart from a certain number of faults, it also outputs an associated interval of values, within which this estimate lies, with a certain confidence. To evaluate this approach we perform a comparative experimental study of the effectiveness of several machine learning algorithms in a software dataset. The data was collected by Pekka Forselious and involves applications maintained by a bank of Finland.