基于Viseme的連續(xù)語音識別系統(tǒng)及Talking Head

蔣冬梅; 謝磊; IlseRavyse; 趙榮椿; HichemSahli; JanCornelis

一级黄色片免费播放|中国黄色视频播放片|日本三级a|可以直接考播黄片影视免费一级毛片

留言板

尊敬的讀者、作者、審稿人, 關(guān)于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復(fù)。謝謝您的支持!

姓名

郵箱

手機(jī)號碼

標(biāo)題

留言內(nèi)容

驗(yàn)證碼

文章導(dǎo)航 > 電子與信息學(xué)報(bào) > 2004 > 26(3): 375-381

蔣冬梅, 謝磊, IlseRavyse, 趙榮椿, HichemSahli, JanCornelis. 基于Viseme的連續(xù)語音識別系統(tǒng)及Talking Head[J]. 電子與信息學(xué)報(bào), 2004, 26(3): 375-381.

引用本文:

蔣冬梅, 謝磊, IlseRavyse, 趙榮椿, HichemSahli, JanCornelis. 基于Viseme的連續(xù)語音識別系統(tǒng)及Talking Head[J]. 電子與信息學(xué)報(bào), 2004, 26(3): 375-381.

Jiang Dong-mei, Xie Lei, Ilse Ravyse, Zhao Rong-chun, Hichem Sahli, Jan Cornelis. The Viseme Based Continuous Speech Recognition System for a Talking Head[J]. Journal of Electronics & Information Technology, 2004, 26(3): 375-381.

Citation:

Jiang Dong-mei, Xie Lei, Ilse Ravyse, Zhao Rong-chun, Hichem Sahli, Jan Cornelis. The Viseme Based Continuous Speech Recognition System for a Talking Head[J]. Journal of Electronics & Information Technology, 2004, 26(3): 375-381.

蔣冬梅, 謝磊, IlseRavyse, 趙榮椿, HichemSahli, JanCornelis. 基于Viseme的連續(xù)語音識別系統(tǒng)及Talking Head[J]. 電子與信息學(xué)報(bào), 2004, 26(3): 375-381.

引用本文:

蔣冬梅, 謝磊, IlseRavyse, 趙榮椿, HichemSahli, JanCornelis. 基于Viseme的連續(xù)語音識別系統(tǒng)及Talking Head[J]. 電子與信息學(xué)報(bào), 2004, 26(3): 375-381.

Citation:

基于Viseme的連續(xù)語音識別系統(tǒng)及Talking Head

計(jì)量
- 文章訪問數(shù): 2687
- HTML全文瀏覽量: 137
- PDF下載量: 618
- 被引次數(shù): 0
出版歷程
- 收稿日期: 2002-07-25
- 修回日期: 2003-03-10
- 刊出日期: 2004-03-19

The Viseme Based Continuous Speech Recognition System for a Talking Head

摘要

摘要: 為實(shí)現(xiàn)聽覺/視覺驅(qū)動(dòng)的說話人頭部動(dòng)畫,該文給出了一個(gè)基于viseme(說話時(shí)的基本嘴形單位)的連續(xù)語音識別系統(tǒng)。它訓(xùn)練viseme隱馬爾可夫模型(HMM),識別語音為viseme圖像序列。建模采用triseme的概念來考慮viseme的上下文相關(guān)性,但它需要超大量的訓(xùn)練數(shù)據(jù)。該文根據(jù)viseme圖像及其相似度權(quán)值(VSW)定義視覺問題集,用來建立triseme決策樹,以實(shí)現(xiàn)triseme的狀態(tài)捆綁及HMM參數(shù)共享。為比較系統(tǒng)性能,基于phoneme(聽覺領(lǐng)域的語音基本單位)的語音識別結(jié)果也被映射為viseme序列。在評價(jià)準(zhǔn)則上,定義viseme圖像相似度加權(quán)識別精度,更全面地考慮輸出和參考圖像序列的差別,并用嘴形圓度和VSW曲線中的突變點(diǎn)來評估所得viseme序列的平滑性。結(jié)果表明,基于viseme的語音識別系統(tǒng)能給出更平滑和合理的嘴形圖像序列。
- 說話人頭部動(dòng)畫; Viseme; Triseme決策樹; Viseme圖像相似度加權(quán); 嘴形圓度; VSW曲線
Abstract: A continuous speech recognition system for a talking head is presented in this paper, which is based on the viseme (the basic speech unit in visual domain) HMMs and segments speech to mouth shape sequences with timing boundaries. The trisemes are for malized to consider the viseme contexts. Based on the 3D talking head images, the viseme similarity weight (VSW) is denned, and 166 visual questions are designed for the building of the triseme decision trees to tie the states of the trisemes with similar contexts, so that they can share the same parameters. For the system evaluation, besides the recognition rate, an image related measurement, the viseme similarity weighted accuracy accounts for the mismatches of the recognized viseme sequence with its reference, and jerky points in liprounding and VSW graphs help evaluate the smoothness of the resulting viseme image sequences. Results show that the viseme based speech recognition system gives smoother and more plausible mouth shapes.

HTML全文

參考文獻(xiàn)(1)

Petajan E D, Goldschen A J, Garcia O N. Continuous automatic speech recognition by lipreading,In Motion-Based Recognition, USA: Kluwer Academnic Publishers, 1997: 321-343.[2]Woodland P C, Young S J, Odell J. Tree-based state tying for high accuracy acoustic modelling.In Proc. ARPA Workshop on Human Language Technology, Plainsboro, NJ, USA, 1994: 307-312.[3]Kate R, Faruquie T A, Kapoor A. Audio driven facial animation for audio-visual reality. In Proc.International Conference on Multimedia and EXPO (ICME), Tokyo, Japan, 2001: 22-25.[4]Young S J. The HTK Hidden Markov Model Toolkit: Design and Philosophy, Technical Report,CUED, Cambridge University, 1994.[5]Young S J, Kershaw D, Odell J, Woodland P. The HTK Book (for HTK Version 3.0),Http://htk.eng.cam.ac.uk/docs/docs.shtml, 2000.[6]Ezzat T, Poggio T. MikeTalk: A talking facial display based on morphing visemes. In Proc.Computer Animation Conference, Philadelphia, USA, 1998: 456-459.

相關(guān)文章

施引文獻(xiàn)

資源附件(0)

訪問統(tǒng)計(jì)