Abstract:[Objective]Addressing the challenges faced by minority languages such as Tibetan, Uyghur, and Mongolian in large language model applications, including sparse vocabulary representation, lack of cultural knowledge, and difficulties in speech-text alignment, this paper explores effective methods for training on small-sample data in low-resource environments to tackle issues such as overlapping relations and nested entities.[Methods]A minority language large model training method MCT-3 (Minority Culture-aware Training with Triple-space fusion) based on the fusion of "knowledge-speech-text" three spaces is proposed. This method constructs a model architecture that includes a Knowledge Injector (K-Adapter), a Speech-Text Alignment Encoder (SJ-Encoder), and a Culture-Sensitive Decoder (CS-Decoder). By injecting knowledge priors to supplement ethnic cultural semantic information and transform it into systematic knowledge, employing dual-granularity alignment learning for precise mapping between speech and text, and using a reinforcement reward mechanism to ensure the cultural appropriateness of generated content, it achieves high-quality minority language understanding and generation under conditions of very limited annotated data.[Results]Experiments conducted on the CSTR-MinorASR dataset, using only 3 hours of labeled speech data, show that the MCT-3 model achieves an average word error rate (WER) of 16.0% in Tibetan, Uyghur, and Mongolian. This represents an improvement of 18.5 and 8.1 percentage points over traditional speech recognition models, respectively. The recall rate for cultural keywords reaches 92.7%, which is more than 20 percentage points higher than the baseline model.[Limitations]The current research has only been validated on three minority languages, and the assessment of cultural sensitivity mainly relies on manual annotation, leaving room for expansion in application scenarios.[Conclusions] The method described in this paper can effectively address key technical challenges in training large models for minority languages. The three-space fusion architecture and culturally sensitive mechanisms can mitigate the poor performance of model training in small-sample scenarios, effectively improving the accuracy of understanding and generating minority languages, and providing a feasible technical path for intelligent applications of low-resource languages.