Abstract:Software quality ensures the reliable running of the software system, and soft ware defects reduce the quality of the software system. Software defects can be identified effectively by mining the codes as well as other related data, so the software defect mining technology has drawn significant attention in software quality assurance. To effectively identify potential software defects from the software modules, a large number of modules labeled as defective or non-defective information need to be collected for model construction. However, the labels of modules are usually obtained by extensive testin g or manual code inspection, which consumes a huge amount of manpower and time. In pract ice, only a small number of labels can be collected, which seriously constrains the performance of defectidentification. To solve this problem, the semi-supervised learning is introduced into software defect mining, thus the mining performance is improved by exploiting the large number of unlabeled modules. Here, the advances and the research status of semi-supervised software defect mining are reviewed and discussed extensively. Firstly, the existing studies on software defect mining is briefly review, and then the four major paradigms of semi-supervised learning are introduced. Finally, various methods and techniques on semi-supervised defect mining are systematically summarized and reviewed.