基于声学模型共享的零资源韩语语音识别

doi:10.16337/j.1004-9037.2023.01.007

首页 > 按月查看>2023年第1月 >93-100. DOI:10.16337/j.1004-9037.2023.01.007

基于声学模型共享的零资源韩语语音识别
DOI:
                        10.16337/j.1004-9037.2023.01.007
                    
作者:
                        
                        
                    
作者单位:1.清华大学电子工程系，北京国家信息科学技术研究中心，北京 100084;2.北京海天瑞声科技股份有限公司，北京 100083
作者简介:
通讯作者:
基金项目:NSFC-通用技术基础研究联合基金重点项目（U1836219）。

Zero Resource Korean ASR Based on Acoustic Model Sharing

Author:

Affiliation:

1.Beijing National Research Center for Information Science and Technology, Department of Electronic Engineering, Tsinghua University, Beijing 100084, China;2.Beijing Haitian Ruisheng Science Technology Ltd., Beijing 100083, China

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

摘要:

精准的语音识别系统通常使用大量的有标注语音数据训练得到，但现有的开源大规模数据集只包含一些广泛使用的语言，诸多小语种则面临着训练数据不足的问题。声学模型共享方法给出了这个问题的一种解决方法，它利用不同语种间的相似性，可以实现不需要小语种语音数据的语音识别。本文将声学模型共享方法扩展到韩语语音识别上，利用汉语声学模型构建韩语和汉语之间的音素映射关系。在不使用任何韩语语音数据的情况下构建的语音识别系统在Zeroth测试集上的字错误率达到了27.33%。同时本文还测试了不同映射方式之间的差异，结果表明这种共享模型的音素映射应当采用将目标语言词汇映射为源语言音素的方式。

Abstract:

A precise speech recognition system usually is based on a large amount of training data with handcrafted transcription， which sets a barrier to the recognition of many low-resource languages. Acoustic model sharing， which is based on the similarity of certain rich and low resource language pair， provides a new method to solve the problem and helps to build an automatic speech recognition （ASR） system without any training data of the given low resource language. This paper expands the method to Korean speech recognition. Specifically， we train an acoustic model on Mandarin data， and lay down a set of mapping rules between Mandarin and Korean phonemes. A character error rate （CER） of 27.33% is achieved on Zeroth Korean test set without using any Korean speech data. Moreover， we also test the difference between source-to-target and target-to-source phoneme mapping rules， and prove that the latter is more appropriate for acoustic model sharing.

参考文献

相似文献

引证文献

引用本文

王皓宇,JEON Eunah,张卫强,李科,黄宇凯.基于声学模型共享的零资源韩语语音识别[J].数据采集与处理,2023,38(1):93-100

复制

文章指标

点击次数:
下载次数:

历史

收稿日期:2021-10-19
最后修改日期:2021-11-04
录用日期:
在线发布日期: 2023-01-25

引用本文

分享

文章指标

历史