Research Progress in Evaluation Techniques for Large Language Models
CSTR:
Author:
Affiliation:

1.Beijing Computer Technology and Applied Research Institute, Beijing 100854, China;2.School of Computer Science and Engineering, Southeast University, Nanjing 211189, China

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    With the widespread application of large language models, the evaluation of large language models has become crucial. In addition to the performance of large language models in downstream tasks, some potential risks should also be evaluated, such as the possibility that large language models may violate human values and be induced by malicious input to trigger security issues. This paper analyzes the commonalities and differences between traditional software, deep learning systems, and large model systems. It summarizes the existing work from the dimensions of functional evaluation, performance evaluation, alignment evaluation, and security evaluation of large language models, and introduces the evaluation criteria for large models. Finally, based on existing research and potential opportunities and challenges, the direction and development prospects of large language models evaluation technology are discussed.

    Reference
    Related
    Cited by
Get Citation

ZHAO Ruizhuo, QU Zichang, CHEN Guoying, WANG Kunlong, XU Zhewei, KE Wenjun, WANG Peng. Research Progress in Evaluation Techniques for Large Language Models[J].,2024,39(3):502-523.

Copy
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:March 29,2024
  • Revised:May 10,2024
  • Adopted:
  • Online: May 25,2024
  • Published:
Article QR Code