Construction of High-Quality Dataset in Aero-engine Domain Based on Large Language Model
CSTR:
Author:
Affiliation:

1.College of Artificial Intelligence, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China;2.MIIT Key Laboratory of Pattern Analysis and Machine Intelligence (Nanjing University of Aeronautics and Astronautics), Nanjing 211106, China;3.COMAC Shanghai Aircraft Design & Research Institute, Shanghai 201210,China

Clc Number:

TP391

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    With the rapid advancement of artificial intelligence technology, large language models (LLMs) are increasingly being applied across various domains. However, the lack of high-quality, manually curated question-answering datasets in the field of aero-engine has hindered the practical application of expert-level question-answering model. To address this issue, this paper proposes an automated method for constructing question-answering datasets based on LLMs, which generates high-quality open-domain question-answering data without human intervention. During the data generation phase, the method employs in-context learning and input-priority generation strategies to enhance the stability of the generated data. In the data filtering phase, a dual evaluation mechanism is established, combining faithfulness assessment based on source text similarity and semantic quality evaluation using large language models, to automatically filter out hallucinated or anomalous data and ensure factual reliability. Experimental results demonstrate that the proposed method significantly improves the quality of the generated dataset. Models fine-tuned on this dataset exhibit notable performance improvements in aero-engine domain knowledge question-answering tasks. The findings of this study not only provide a solid foundation for the application of large language model in the aero-engine domain but also offer valuable insights for automated dataset construction in other complex engineering fields.

    Reference
    Related
    Cited by
Get Citation

ZOU Guanyun, WANG Cunjun, KONG Yinhao, MA Xiaoqing, LI Piji. Construction of High-Quality Dataset in Aero-engine Domain Based on Large Language Model[J].,2025,40(3):603-615.

Copy
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:October 13,2024
  • Revised:January 15,2025
  • Adopted:
  • Online: June 13,2025
  • Published:
Article QR Code