基于本征音子說話人子空間的說話人自適應(yīng)算法
doi: 10.11999/JEIT141264 cstr: 32379.14.JEIT141264
基金項目:
國家自然科學(xué)基金(61175017, 61302107和61403415)資助課題
Speaker Adaptation Method Based on Eigenphone Speaker Subspace for Speech Recognition
-
摘要: 本征音子說話人自適應(yīng)算法在自適應(yīng)數(shù)據(jù)量充足時可以取得很好的自適應(yīng)效果,但在自適應(yīng)數(shù)據(jù)量不足時會出現(xiàn)嚴重的過擬合現(xiàn)象。為此該文提出一種基于本征音子說話人子空間的說話人自適應(yīng)算法來克服這一問題。首先給出基于隱馬爾可夫模型-高斯混合模型(HMM-GMM)的語音識別系統(tǒng)中本征音子說話人自適應(yīng)的基本原理。其次通過引入說話人子空間對不同說話人的本征音子矩陣間的相關(guān)性信息進行建模;然后通過估計說話人相關(guān)坐標(biāo)矢量得到一種新的本征音子說話人子空間自適應(yīng)算法。最后將本征音子說話人子空間自適應(yīng)算法與傳統(tǒng)說話人子空間自適應(yīng)算法進行了對比?;谖④浾Z料庫的漢語連續(xù)語音識別實驗表明,與本征音子說話人自適應(yīng)算法相比,該算法在自適應(yīng)數(shù)據(jù)量極少時能大幅提升性能,較好地克服過擬合現(xiàn)象。與本征音自適應(yīng)算法相比,該算法以較小的性能犧牲代價獲得了更低的空間復(fù)雜度而更具實用性。
-
關(guān)鍵詞:
- 語音信號處理 /
- 說話人自適應(yīng) /
- 本征音子 /
- 本征音子說話人子空間 /
- 低秩約束 /
- 本征音
Abstract: The eigenphone speaker adaptation method performs well when the amount of adaptation data is sufficient. However, it suffers from severe over-fitting when insufficient amount of adaptation data is provided. A speaker adaptation method based on eigenphone speaker subspace is proposed to overcome this problem. Firstly, a brief overview of the eigenphone speaker adaptation method is presented in case of Hidden Markov Model-Gaussian Mixture Model (HMM-GMM) based speech recognition system. Secondly, speaker subspace is introduced to model the inter-speaker correlation information among different speakers eigenphones. Thirdly, a new speaker adaptation method based on eigenphone speaker subspace is derived from estimation of a speaker dependent coordinate vector for each speaker. Finally, a comparison between the new method and traditional speaker subspace based method is discussed in detail. Experimental results on a Mandarin Chinese continuous speech recognition task show that compared with original eigenphone speaker adaptation method, the performance of the eigenphone speaker subspace method can be improved significantly when insufficient amount of adaptation data is provided. Compared with eigenvoice method, eigenphone speaker subspace method can save a great amount of storage space only at the expense of minor performance degradation. -
Zhang Wen-lin, Zhang Wei-qiang, Li Bi-cheng, et al.. Bayesian speaker adaptation based on a new hierarchical probabilistic model[J]. IEEE Transactions on Audio, Speech and Language Processing, 2012, 20(7): 2002-2015. Solomonoff A, Campbell W M, and Boardman I. Advances in channel compensation for SVM speaker recognition[C]. Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Philadelphia, United States, 2005: 629-632. Kumar D S P, Prasad N V, Joshi V, et al.. Modified splice and its extension to non-stereo data for noise robust speech recognition[C]. Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop(ASRU), Olomouc, Czech Republic, 2013: 174-179. Ghalehjegh S H and Rose R C. Two-stage speaker adaptation in subspace Gaussian mixture models[C]. Proceedings of International Conference on Audio, Speech and Signal Processing(ICASSP), Florence, Italy, 2014: 6374-6378. Wang Y Q and Gale M J F. Tandem system adaptation using multiple linear feature transforms[C]. Proceedings of International Conference on Audio, Speech and Signal Processing(ICASSP), Vancouver, Canada, 2013: 7932-7936. Kenny P, Boulianne G, and Dumouchel P. Eigenvoice modeling with sparse training data[J]. IEEE Transactions on Speech and Audio Processing, 2005, 13(3): 345-354. Kenny P, Boulianne G, Dumouchel P, et al.. Speaker adaptation using an eigenphone basis[J]. IEEE Transaction on Speech and Audio Processing, 2004, 12(6): 579-589. Zhang Wen-lin, Zhang Wei-qiang, and Li Bi-cheng. Speaker adaptation based on speaker-dependent eigenphone estimation[C]. Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop(ASRU), Hawaii, United States, 2011: 48-52. 張文林, 張連海, 陳琦, 等. 語音識別中基于低秩約束的本征音子說話人自適應(yīng)方法[J]. 電子與信息學(xué)報, 2014, 36(4): 981-987. Zhang Wen-lin, Zhang Lian-hai, Chen Qi, et al.. Low-rank constraint eigenphone speaker adaptation method for speech recognition[J]. Journal of Electronics Information Technology, 2014, 36(4): 981-987. Zhang Wen-lin, Qu Dan, and Zhang Wei-qiang. Speaker adaptation based on sparse and low-rank eigenphone matrix estimation[C]. Proceedings of Annual Conference on International Speech Communication Association (INTERSPEECH), Singapore, 2014: 2972-2976. Wang N, Lee S, Seide F, et al.. Rapid speaker adaptation using a priori knowledge by eigenspace analysis of MLLR parameters[C]. Proceedings of International Conference on Audio, Speech and Signal Processing(ICASSP), Salt Lake City, United States, 2001: 345-348. Povey D and Yao K. A basis representation of constrained MLLR transforms for Robust adaptation[J]. Computer Speech and Language, 2012, 26(1): 35-51. Miao Y, Metze F, and Waibel A. Learning discriminative basis coefficients for eigenspace MLLR unsupervised adaptation[C]. Proceedings of International Conference on Audio, Speech and Signal Processing(ICASSP), Vancouver, Canada, 2013: 7927-7931. Saz O and Hain T. Using contextual information in joint factor eigenspace MLLR for speech recognition in diverse scenarios[C]. Proceedings of International Conference on Audio, Speech and Signal Processing(ICASSP), Florence, Italy, 2014: 6364-6368. Chang E, Shi Y, Zhou J, et al.. Speech lab in a box: a Mandarin speech toolbox to jumpstart speech related research[C]. Proceedings of 7th?European Conference on Speech Communication and Technology(Eurospeech), Aalborg, Denmark, 2001: 2799-2802. -
計量
- 文章訪問數(shù): 1307
- HTML全文瀏覽量: 127
- PDF下載量: 514
- 被引次數(shù): 0