Phân tích thực nghiệm về độ phức tạp của mã và tác động của nó đến mật độ lỗi phần mềm

Doãn Thị Thúy Hiền

An Empirical Analysis of Code Complexity and Its Impact on Software Defect Density

Authors:

Hien Doan Thi Thuy

Pages:

View:

134

Position:

11/11

Download:

Download PDF

Download JournalTOCs

Abtract

This study investigates the relationship between software complexity metrics and defect density using five NASA-PROMISE datasets (JM1, CM1, KC1, KC2, and PC1). Five widely validated complexity metrics—Cyclomatic Complexity, Lines of Code (LOC), Halstead Volume, Halstead Difficulty, and Essential Complexity—were analyzed through both statistical and machine learning approaches. Pearson and Spearman correlation analyses reveal that while linear correlations are weak and negative, monotonic relationships are consistently positive and statistically significant (𝑝 < 0.01), with LOC demonstrating the strongest association. Four machine learning models (Logistic Regression, Support Vector Machine, Random Forest, and Gradient Boosting) were evaluated under two resampling techniques (SMOTE and Borderline-SMOTE), resulting in 40 experimental configurations. Random Forest achieved the best overall performance, with the highest result obtained on KC1 using Borderline-SMOTE (AUC = 0.868, F1 = 0.500, MCC = 0.432). Feature importance analysis indicates that LOC, Halstead Volume, and Halstead Difficulty collectively contribute over 81% of the predictive power, whereas Essential Complexity shows minimal impact. Compared to prior work, this study improves performance on multiple datasets through class balancing techniques and provides a more comprehensive evaluation using AUC and MCC. The results confirm that a small set of well-established complexity metrics can effectively support software defect prediction when
combined with appropriate preprocessing and learning strategies.

Xem thêm Ẩn bớt

Relate

Keyword

Software defect prediction Software complexity metrics NASA-PROMISE Random Forest SMOTE Feature importance.

Articles in the same issue

Morphological characteristics of the population of the donkey croaker Pennahia aneus (Bloch, 1793) in the estuary and Coastal areas of Thanh Hoa province

Thao Hoang Ngoc

Volume 55, Issue 2A, 06/2026

Proposing solutions to support the accounting of cutting tables used in garment manufacturing

Oanh Ta Vu Thuc

Volume 55, Issue 2A, 06/2026

Adaptive Reflection Control for Anti-Jamming Backscatter Communication under Dynamic Interference

Hiep Le Hoang, Tran Minh Duy

Volume 55, Issue 2A, 06/2026

Vinh University journal of science

Tạp chí khoa học Trường Đại học Vinh

ISSN: 1859 - 2228

Governing body: Vinh University

Address: 182 Le Duan - Vinh City - Nghe An province
Phone: (+84) 238.3855.452 - Fax: (+84) 238.3855.269
Email: vinhuni@vinhuni.edu.vn
Website: https://vinhuni.edu.vn

License: 163/GP-BTTTT issued by the Minister of Information and Communications on May 10, 2023

Open Access License: Creative Commons CC BY NC 4.0

CONTACT

Editor-in-Chief: Assoc. Prof., Dr. Tran Ba Tien
Email: tientb@vinhuni.edu.vn

Deputy editor-in-chief: Assoc. Prof., Dr. Phan Van Tien
Email: vantienkxd@vinhuni.edu.vn

Sub-Editor: Dr. Do Mai Trang
Email: domaitrang@vinhuni.edu.vn

Editorial assistant: Msc. Le Tuan Dung, Msc. Phan The Hoa, Msc. Pham Thi Quynh Nga, Msc. Tran Thi Thai

Address: 4th Floor, Executive Building, No. 182, Le Duan street, Vinh city, Nghe An province.
Phone: (+84) 238-385-6700 | Hotline: (+84) 97-385-6700
Email: editors@vujs.vn
Website: https://vujs.vn

Vinh University Journal of Science

ISSN: 1859-2228

An Empirical Analysis of Code Complexity and Its Impact on Software Defect Density

Design and fabrication of a dual-axis solar tracking system

Application of ByteTrack and YOLOv10 models in solving the object tracking problem

Machine learning-based prediction of construction cost overruns using Random Forest

A hybrid LightGBM-LSTM machine learning model for short-term water level forecasting in the Mekong River Basin

Analyzing the impact of network failures on the routing performance of OSPF and EIGRP

Context representation for LLM based code generation in visual studio: a systematic review

Phishing email detection using temporal behavioral modeling and transformer architectures

Morphological characteristics of the population of the donkey croaker Pennahia aneus (Bloch, 1793) in the estuary and Coastal areas of Thanh Hoa province

Proposing solutions to support the accounting of cutting tables used in garment manufacturing

Adaptive Reflection Control for Anti-Jamming Backscatter Communication under Dynamic Interference

Vinh University journal of science

Tạp chí khoa học Trường Đại học Vinh

ISSN: 1859 - 2228