一级黄色片免费播放|中国黄色视频播放片|日本三级a|可以直接考播黄片影视免费一级毛片

高級(jí)搜索

留言板

尊敬的讀者、作者、審稿人, 關(guān)于本刊的投稿、審稿、編輯和出版的任何問(wèn)題, 您可以本頁(yè)添加留言。我們將盡快給您答復(fù)。謝謝您的支持!

姓名
郵箱
手機(jī)號(hào)碼
標(biāo)題
留言內(nèi)容
驗(yàn)證碼

基于遞歸神經(jīng)網(wǎng)絡(luò)的語(yǔ)音識(shí)別快速解碼算法

張舸 張鵬遠(yuǎn) 潘接林 顏永紅

張舸, 張鵬遠(yuǎn), 潘接林, 顏永紅. 基于遞歸神經(jīng)網(wǎng)絡(luò)的語(yǔ)音識(shí)別快速解碼算法[J]. 電子與信息學(xué)報(bào), 2017, 39(4): 930-937. doi: 10.11999/JEIT160543
引用本文: 張舸, 張鵬遠(yuǎn), 潘接林, 顏永紅. 基于遞歸神經(jīng)網(wǎng)絡(luò)的語(yǔ)音識(shí)別快速解碼算法[J]. 電子與信息學(xué)報(bào), 2017, 39(4): 930-937. doi: 10.11999/JEIT160543
ZHANG Ge, ZHANG Pengyuan, PAN Jielin, YAN Yonghong. Fast Decoding Algorithm for Automatic Speech Recognition Based on Recurrent Neural Networks[J]. Journal of Electronics & Information Technology, 2017, 39(4): 930-937. doi: 10.11999/JEIT160543
Citation: ZHANG Ge, ZHANG Pengyuan, PAN Jielin, YAN Yonghong. Fast Decoding Algorithm for Automatic Speech Recognition Based on Recurrent Neural Networks[J]. Journal of Electronics & Information Technology, 2017, 39(4): 930-937. doi: 10.11999/JEIT160543

基于遞歸神經(jīng)網(wǎng)絡(luò)的語(yǔ)音識(shí)別快速解碼算法

doi: 10.11999/JEIT160543 cstr: 32379.14.JEIT160543
基金項(xiàng)目: 

國(guó)家自然科學(xué)基金(U1536117, 11590770-4),國(guó)家重點(diǎn)研發(fā)計(jì)劃重點(diǎn)專項(xiàng)(2016YFB0801200, 2016YFB0801203),新疆維吾爾自治區(qū)科技重大專項(xiàng)(2016A03007-1)

Fast Decoding Algorithm for Automatic Speech Recognition Based on Recurrent Neural Networks

Funds: 

The National Natural Science Foundation of China (U1536117, 11590770-4), The National Key Research and Development Plan of China (2016YFB0801200, 2016YFB0801203), The Key Science and Technology Project of the Xinjiang Uygur Autonomous Region (2016A03007-1)

  • 摘要: 遞歸神經(jīng)網(wǎng)絡(luò)(Recurrent Neural Network, RNN)如今已經(jīng)廣泛用于自動(dòng)語(yǔ)音識(shí)別(Automatic Speech Recognition, ASR)的聲學(xué)建模。雖然其較傳統(tǒng)的聲學(xué)建模方法有很大優(yōu)勢(shì),但相對(duì)較高的計(jì)算復(fù)雜度限制了這種神經(jīng)網(wǎng)絡(luò)的應(yīng)用,特別是在實(shí)時(shí)應(yīng)用場(chǎng)景中。由于遞歸神經(jīng)網(wǎng)絡(luò)采用的輸入特征通常有較長(zhǎng)的上下文,因此利用重疊信息來(lái)同時(shí)降低聲學(xué)后驗(yàn)和令牌傳遞的時(shí)間復(fù)雜度成為可能。該文介紹了一種新的解碼器結(jié)構(gòu),通過(guò)有規(guī)律拋棄存在重疊的幀來(lái)獲得解碼過(guò)程中的計(jì)算開銷降低。特別地,這種方法可以直接用于原始的遞歸神經(jīng)網(wǎng)絡(luò)模型,只需對(duì)隱馬爾可夫模型(Hidden Markov Model, HMM)結(jié)構(gòu)做小的變動(dòng),這使得這種方法具有很高的靈活性。該文以時(shí)延神經(jīng)網(wǎng)絡(luò)為例驗(yàn)證了所提出的方法,證明該方法能夠在精度損失相對(duì)較小的情況下取得2~4倍的加速比。
  • GRAVES Alex, JAITLY Navdeep, and MOHAMED Abdel-rahman. Hybrid speech recognition with deep bidirectional LSTM[C]. 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Olomouc, Czech Republic, 2013: 273-278.
    SAK Hasim, SENIOR Andrew, and BEAUFAYS Franoise. Long short-term memory recurrent neural network architectures for large scale acoustic modeling[C]. 15th Annual Conference of the International Speech Communication Association (Interspeech 2014), Singapore, 2014: 338-342.
    NARAYANAN Arun, MISRA Ananya, and CHIN Kean. Large-scale, sequence-discriminative, joint adaptive training for masking-based robust ASR[C]. 16th Annual Conference of the International Speech Communication Association (Interspeech 2015), Dresden, Germany, 2015: 3571-3575.
    LI Jinyu, MOHAMED Abdelrahman, ZWEIG Geoffrey, et al. Exploring multidimensional LSTMs for large vocabulary ASR[C]. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 2016: 4940-4944.
    PEDDINTI Vijayaditya, POVEY Daniel, and KHUDANPUR Sanjeev. A time delay neural network architecture for efficient modeling of long temporal contexts[C]. 16th Annual Conference of the International Speech Communication Association (Interspeech 2015), Dresden, Germany, 2015: 3214-3218.
    SNYDER David, GARCIA-ROMERO Daniel, and POVEY Daniel. Time delay deep neural network-based universal background models for speaker recognition[C]. 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Scottsdale, USA, 2015: 92-97.
    PEDDINTI Vijayaditya, CHEN Guoguo, MANOHAR Vimal, et al. JHU ASpIRE system: robust LVCSR with TDNNs, i-vector adaptation, and RNN-LMs[C]. 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Scottsdale, USA, 2015: 539-546.
    SEIDE Frank, LI Gang, and YU Dong. Conversational speech transcription using context-dependent deep neural networks[C]. 12th Annual Conference of the International Speech Communication Association (Interspeech 2011), Florence, Italy, 2011: 437-440.
    SELTZER Michael L, YU Dong, and WANG Yongqiang. An investigation of deep neural networks for noise robust speech recognition[C]. 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, Canada, 2013: 7398-7402.
    VANHOUCKE Vincent, DEVIN Matthieu, and HEIGOLD Georg. Multiframe deep neural networks for acoustic modeling[C]. 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, Canada, 2013: 7582-7585.
    MOORE Darren, DINES John, DOSS Mathew Magimai, et al. Juicer: A Weighted Finite-State Transducer Speech Decoder[M]. Berlin, Heidelberg, Springer, 2006: 285-296.
    YOUNG S J, RUSSELL N H, and THORNTON J H S. Token passing: A simple conceptual model for connected speech recognition systems[R]. CUED/F-INFENG/TR38, Engineering Department, Cambridge University, 1989.
    NOLDEN David, SCHLTER Ralf, and NEY Hermann. Extended search space pruning in LVCSR[C]. 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, 2012: 4429-4432.
    郭宇弘. 基于加權(quán)有限狀態(tài)轉(zhuǎn)換機(jī)的語(yǔ)音識(shí)別系統(tǒng)研究[D]. [博士論文], 中國(guó)科學(xué)院大學(xué), 2013: 1-20.
    GUO Yuhong. Automatic speech recognition system based on weighted finite-state transducers[D]. [Ph.D. dissertation], University of Chinese Academy of Sciences, 2013: 1-20.
    RABINER Lawrence R and JUANG Biinghwang. An introduction to hidden Markov models[J]. IEEE ASSP Magazine, 1986, 3(1): 4-16. doi: 10.1109/MASSP.1986. 1165342
    YOUNG Steve, EVERMANN Gunnar, GALES Mark, et al. The HTK Book Vol. 2[M]. Cambridge, Entropic Cambridge Research Laboratory, 1997: 59-210.
    ZHANG Qingqing, SOONG Frank, QIAN Yao, et, al. Improved modeling for F0 generation and V/U decision in HMM-based TTS[C]. 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), Dallas, USA, 2010: 4606-4609.
  • 加載中
計(jì)量
  • 文章訪問(wèn)數(shù):  1923
  • HTML全文瀏覽量:  202
  • PDF下載量:  744
  • 被引次數(shù): 0
出版歷程
  • 收稿日期:  2016-05-26
  • 修回日期:  2017-01-09
  • 刊出日期:  2017-04-19

目錄

    /

    返回文章
    返回