Knowledge Distillation of Large Language Models Based on Chain of Thought
CSTR:
Author:
Affiliation:

1.School of Computer Science and Technology, Xidian University, Xi’an 710000, China;2.Key Laboratory of Counter-Terrorism Command & Information Engineering of Ministry of Education (Approval), Engineering University of PAP, Xi’an 710086, China

Clc Number:

TP391

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    The chain of thought (CoT) prompts enable large language models to process complex tasks according to specific reasoning steps, allowing them to demonstrate stronger capabilities in common sense reasoning, mathematical logic reasoning, and interpretability. However, the main drawback of the CoT approach lies in its reliance on massive language models, which typically have billions of parameters and face challenges in large-scale deployment. To address this issue, this paper proposes a large model knowledge distillation method based on the CoT, aiming to fully leverage the thinking and reasoning capabilities of large language models. Through knowledge distillation techniques, the main goal is to guide smaller models in solving complex tasks.This study adopts a large model as the teacher model and a small model as the student model, fine-tuning the student model by acquiring reasoning data from the teacher model. Through a series of carefully designed methods, such as changing data generation methods, clustering-based sampling of question-answer examples, heuristic correction of examples, and adaptive generation of answers, this study makes the generation process of the teacher model more efficient, resulting in higher-quality and larger quantities of reasoning data. This enables better fine-tuning of the student model, allowing it to acquire strong reasoning capabilities and achieve efficient knowledge distillation. The framework of this study aims to establish an effective knowledge transfer mechanism, allowing the deep thinking of large models to effectively guide smaller models, providing more intelligent and efficient solutions for solving complex tasks. Through this approach, we hope to overcome the challenges of deploying large models and promote the application and advancement of language models in the real world.

    Reference
    Related
    Cited by
Get Citation

Li Ronghan, Pu Rongcheng, Shen Jianan, Li Dongdong, Miao Qiguang. Knowledge Distillation of Large Language Models Based on Chain of Thought[J].,2024,39(3):547-558.

Copy
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:April 02,2024
  • Revised:April 26,2024
  • Adopted:
  • Online: May 25,2024
  • Published:
Article QR Code