Vietnamese Speech Recognition Based on Pre-training and Phone-Based Byte-Pair Encoding

doi:10.16337/j.1004-9037.2023.01.008

Home > Archive>Volume 38, Issue 1, 2023 >101-110. DOI:10.16337/j.1004-9037.2023.01.008

Vietnamese Speech Recognition Based on Pre-training and Phone-Based Byte-Pair Encoding
DOI:
                        10.16337/j.1004-9037.2023.01.008
                    
CSTR:
                        
Author:
                        
Affiliation:Department of Electronic Engineering & Information Science, University of Science and Technology of China, Hefei 230027, China
Clc Number:TN912.34
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Based on the unsupervised pre-training technology， wav2vec 2.0 has become a research hotspot for the state of the art performance in many low-resource languages. In this paper， the Vietnamese continuous speech recognition is carried out on the basis of the pre-trained model. The phonetics information is integrated into the connectionist temporal classification （CTC） loss function based acoustic modeling， and the phones and the position dependent phones are selected as the basic modeling units. To balance the number of modeling units and the refinement of the model， a byte-pair encoding （BPE） algorithm is used to generate phone based subwords， and the contextual information is integrated into the acoustic modeling process. Experiments are carried out on the low-resource Vietnamese development set of NIST’s BABEL task， and the proposed algorithm significantly improves the wav2vec 2.0 baseline system. The word error rate is reduced from 37.3% to 29.4%.

Reference

Cited by

Get Citation

SHEN Zhijie, GUO Wu. Vietnamese Speech Recognition Based on Pre-training and Phone-Based Byte-Pair Encoding[J]. Journal of Data Acquisition and Processing,2023,38(1):101-110.

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:July 27,2021
Revised:December 27,2021
Adopted:
Online: January 25,2023
Published:

For Authors

Get Citation

Related Videos

Share

Article Metrics

History

Article QR Code