Abstract:
This paper comprehensively analyzes the technical origins and evolution of ChatGPT by reviewing the development of deep learning, language models, semantic representation and pre-training techniques. In terms of language models, the early N-gram statistical method gradually evolved into the neural network language models. Researches and advancements on machine translation also led to the emergence of Transformer, which in turn catalyzed the development of neural network language models. Recording semantic representation and pre-training techniques, there has been an evolution from early statistical methods such as TF-IDF, pLSA and LDA, to neural network-based word vector representations like Word2Vec, and then to pre-trained language models, like ELMo, BERT and GPT-2. The pre-training frameworks have become increasingly sophisticated, providing rich semantic knowledge for models. The emergency of GPT-3 revealed the potential of large language models, but hallucination problems like uncontrollable generation, knowledge fallacies and poor logical reasoning capability still existed. To alleviate these problems, ChatGPT aligned further with humans on GPT-3.5 through instruction learning, supervised fine-tuning, and reinforcement learning from human feedback, continuously improving its capabilities. The emergency of large language models like ChatGPT signifies this field entering a new developmental stage, opening up new possibilities for human-computer interaction and general artificial intelligence.