Evaluating the Performance of Machine Learning Classifier Algorithms for Software Estimation in Software Development Projects

Authors

DOI:

https://doi.org/10.21015/vtse.v12i1.1770

Abstract

The major aim of this research is to rank the best performing features in order to classify the Software estimation dataset using SVM, Naïve Bayes, Random forest, Decision tree, and KNN classifiers and evaluate their accuracy. Two steps are involved in the classification process: first, the dataset with all attributes is analyzed; second, the information gain methodology is used to rank the attributes, and only the highly rated ones are used to generate the model of classification. Using several folds of cross-validation, we assess the accuracy rank of SVM, Naïve Bayes, Decision tree, Random forest, and KNN classifier

References

J. Han and M. Kamber, Data Mining Concepts and Techniques, Morgan Kaufmann, 2001.

X. Xiong et al., "Analysis of software estimation using Data Mining & Statistical Techniques," in IEEE Proceedings

of 6th International Conference on Software Engineering, 2015, pp. 82-87.

J. Sayyad Shirabad and T. J. Menzies, "The PROMISE Repository of Software Engineering Databases," School

of Information Technology and Engineering, University of Ottawa, Canada. [Online]. Available: http://promise. site.uottawa.ca/SERepository

M. Prasad, "Online Feature Selection for Classification," International Journal of Computational Intelligence Systems, vol. 1, no. 2, 2018, pp. 127-133.

D. Delen, G. Walker, and A. Kadam, "Predicting software effort estimation: A comparison of three data mining

methods," Artificial Intelligence in Medicine, vol. 34, no. 2, 2015, pp. 113-127.

H. B. Burk et al., "Artificial Neural Networks Improve the Accuracy of software Prediction," 79(4), 2019, pp. 857862.

M. Lundin et al., "Artificial Neural Networks Applied to estimation of software efforts," Sustainability, vol. 57, no.

, 2020, pp. 281-286.

P. C. Pendharkar et al., "Association, Statistical, Mathematical and Neural Approaches for Mining," Expert systems

with Applications, vol. 17, 2019, pp. 223-232. [9]P. Clark and T. Niblett, "Induction in Noisy Domains," in

Progress in Machine Learning, eds. I. Bratko & N. Lavac,

Sigma Press, 2017, pp. 11-30.

P. Clark and T. Niblett, "The CN2 induction algorithm," Machine Learning Journal, vol. 3, no. 4, 2018, pp. 261283.

G. Cestnik et al., "Assistant-86: A Knowledge Elicitation Tool for Sophisticated Users," in Progress in Machine

Learning, eds. I. Bratko & N. Lavrac, Sigma Press, 2017, pp. 31-45.

H. Zhang and J. Su, "Naïve Bayesian Classifiers for Ranking," in Proceedings of 15th European Conference on Machine Learning, Springer, 2014, pp. 501-512.

L. Jianq and Y. Guo, "Learning Lazy Naïve Bayesian Classifiers for Ranking," in Proceedings of 17th International

Conference on Tools with Artificial Intelligence, 2015, pp. 412-416.

J. Huang et al., "Comparing Naïve Bayes, Decision Trees and SVM with AUC and Accuracy," in Proceedings of 3rd

International Conference on Datamining, IEEE Computer Society Press, 2013, pp. 553-556.

D. T. Larase, Discovering Knowledge in Data. An introduction to Data mining, John Wiley & Sons, Inc, 2015.

I. H. Witten and E. Frank, Datamining: Practical Machine Learning Tools and Techniques, 2nd edn., Elsevier, 2015.

Mannan, A., Qamar, R., & Arshad, S. (2024). SemiAutomated Approach for Evaluation of Software Defect

Management Process using ML Approach. VAWKUM Transactions on Computer Sciences, 12(1), 20-33.

R. Qamar, R. Asif, L. F. Naz, A. Mannan, & A. Hussain, "FlightForecast: A Comparative Analysis of Stack LSTM

and Vanilla LSTM Models for Flight Prediction," VFAST Transactions on Software Engineering, vol. 12, no. 1, pp.

-24, 2024.

M. A. Mannan and A. Ansari, "SPMM: A Model Taxonomy for Designing and Managing Quality System," in IEEE Access, vol. 10, pp. 76720-76730, 2022. doi: 10.1109/ACCESS.2022.3190081.

E. C. Ltd, “Software effort estimation,” GitHub, Available: https://github.com/edusoftresearch/SEEData, 2023.

Y. Mahmood, N. Kama, A. Azmi, A. S. Khan, and M. Ali, “Software effort estimation accuracy prediction of machine

learning techniques: A systematic performance evaluation,” Software: Practice and Experience, vol. 52, no.

, pp. 39–65, 2022.

A. Jadhav, M. Kaur, and F. Akter, “Evolution of software development effort and cost estimation techniques:

five decades study using automated text mining approach,” Mathematical Problems in Engineering, vol. 2022, pp. 1–17, 2022.

P. Suresh Kumar, H. Behera, J. Nayak, and B. Naik, “A pragmatic ensemble learning approach for effective

software effort estimation,” Innovations in Systems and Software Engineering, vol. 18, no. 2, pp. 283–299, 2022.

P. V. AG and V. Varadarajan, “Estimating software development efforts using a random forest-based stacked

ensemble approach,” Electronics, vol. 10, no. 10, p. 1195, 2021.

Z. R. Mohsin, “Comparative study for software effort estimation by soft computing models,” Journal of Education

for Pure Science-University of Thi-Qar, vol. 11, no. 2, pp. 108–120, 2021.

Downloads

Published

2024-03-31

How to Cite

Mannan, M. A., Qamar, R., Khan, I. U., Hussain, A., Ahmed, S., & Khan, J. (2024). Evaluating the Performance of Machine Learning Classifier Algorithms for Software Estimation in Software Development Projects. VFAST Transactions on Software Engineering, 12(1), 70–78. https://doi.org/10.21015/vtse.v12i1.1770