基于Teager能量算子和經(jīng)驗?zāi)B(tài)分解的語音端點(diǎn)檢測算法
doi: 10.11999/JEIT171014 cstr: 32379.14.JEIT171014
-
(上海應(yīng)用技術(shù)大學(xué)電氣與電子工程學(xué)院 上海 201418)
基金項目:
上海市科委基金(15ZR1440700)
Teager Energy Operator and Empirical Mode Decomposition Based Voice Activity Detection Method
-
SHEN Xizhong ZHENG Xiaoxiu
Funds:
Foundation of Shanghai Science and Technology Commission of Shanghai Municipality (15ZR1440700)
-
摘要: Teager能量算子是近年來提出的非線性方法,具有跟蹤時變信號的特點(diǎn),該文結(jié)合該算子和經(jīng)驗?zāi)B(tài)分解方法,提出一種新的語音端點(diǎn)檢測算法,用于尋找合理的語音起始和終止端點(diǎn)。該算法利用經(jīng)驗?zāi)B(tài)分解,提出本征模態(tài)函數(shù)的有效性篩選條件,通過篩選本征模態(tài)函數(shù),使得該算法能夠處理含噪語音信號,同時分解所得單模態(tài)特性正好滿足TEO算子對單成份能量跟蹤的要求,最后利用Hilbert變換解決了可能存在的模態(tài)混疊問題。經(jīng)過這些處理,算法能夠處理語音信號中清音段的端點(diǎn)標(biāo)識,比直接TEO、雙門限法效果好。通過大量實驗驗證了該算法的有效性。
-
關(guān)鍵詞:
- 語音端點(diǎn)檢測 /
- Teager能量算子 /
- 經(jīng)驗?zāi)B(tài)分解 /
- 本征模態(tài)函數(shù) /
- Hilbert變換
Abstract: In recent years, Teager energy operator is proposed as a kind of nonlinear method characterized with tracking a time-varying signal. The operator is combined with empirical mode decomposition, and a new method of voice activity detection is proposed to find the best voice start point and end point. Empirical Mode Decomposition (EMD) is further exploited and some valid choice conditions are constructed to choose the valid intrinsic mode functions. Thus, the method is able to deal with the voice with noise. Also, the character of the single mode of empirical mode decomposition meets the demand of single frequency component required by Teager Energy Operator (TEO). At last, Hilbert transform is added to solve the inherent problem of the mode mixing due to empirical mode decomposition. Based on the above consideration, the proposed method can identify the unvoiced sound with noise, which is better than the direct TEO and double threshold method. Experiments show the validity of the proposed method. -
[2] KUMAR J and JENA P. Solution to fault detection during power swing using Teager-Kaiser Energy Operator[J]. Arabian Journal for Science and Engineering, 2017, 42(12): 5003-5013. 胡航. 現(xiàn)代語音信號處理[M]. 北京: 電子工業(yè)出版社, 2014: 30-48. [3] BHOWMICK A, CHANDRA M, and BISWAS A. Speech enhancement using Teager energy operated ERB-like perceptual wavelet packet decomposition[J]. International Journal of Speech Technology, 2017(4): 1-15. HAN Xiaohuan and JING Xinxing. Speech endpoint detection based on power spectrum diference and Teager energy operator[J]. Computer Application and Software, 2011, 28(4): 82-83. LI Jie, ZHOU Ping, and DU Zhiran. Application of short-time TEO energy in noisy speech endpoint[J]. Computer Engineering and Applications, 2013, 49(12): 144-147. doi: 10.3778/j.issn.1002-8331.1110-0479. WANG Maorong, ZHOU Ping, JING Xinxing, et al. Voice activity detection algorithm based on Mel-TEO in noisy environment[J]. Microelectronics & Computer, 2016, 33(4): 46-49. doi: 10.19304/j.cnki.issn1000-7180.2016.04.010. WANG Minghe, ZHANG Erhua, TANG Zhenmin, et al. Voice activity detection based on Fisher linear discriminant analysis[J]. Journal of Electronics & Information Technology, 2015, 37(6): 1343-1349. doi: 10.11999/JEIT141122. LI Ye, ZHANG Renzhi, CUI Huijuan, et al. Voice activity detection with low signal-to-noise rations based on the spectrum entropy[J]. Journal of Tsinghua University (Science and Technology), 2005, 45(10): 1397-1440. LIU Huan, WANG Jun, LIN Qiguang, et al. A novel speech activity detection algorithm based on the fusion of time and frequency domain features[J]. Journal of Jiangsu University of Science and Technology(Natural Science Edition), 2017, 31(1): 73-78. doi: 10.3969/j.issn.1673-4807.2017.01.014. [10] WAN Yulong, WANG Xianliang, ZHOU Ruohua, et al. Enhanced voice activity detection based on automatic segmentation and event classification[J]. Journal of Computational Information Systems, 2014, 10(10): 4169-4177. [11] GHOSH P K, TSIARTAS A, and NARAYANAN S. Robust voice activity detection using long-term signal variability[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(3): 600-613. LU Zhimao, JIN Hui, ZHANG Chunxiang, et al. Voice activity detection in complex environment based on Hilbert-Huang transform and order statistics filter[J]. Journal of Electronics & Information Technology, 2012, 34(1): 213-217. doi: 10.3724/SP.J.1146.2011.0047. [13] CHOI Jaehun and CHANG Joonhyuk. Dual-microphone voice activity detection technique based on two-step power level difference ratio[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2014, 22(6): 1069-1081. [14] TEAGER H and TEAGER S. Evidence for Nonlinear Sound Production Mechanisms in the Vocal Tract[M]. Springer, 1990: 241-261. [15] KAISER J F. On a simple algorithm to calculate the energy of a signal[C]. IEEE International Conference on Acoustics, New York, USA, 1990: 381-384. [16] HUANG N E, SHEN Z, LONG S R, et al. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis[J]. Proceedings: Mathematical, Physical and Engineering Sciences, 1998, 454(1971): 903–995. [17] KIRBAS I and PEKER M. Signal detection based on empirical mode decomposition and Teager-Kaiser energy operator and its application to P and S wave arrival time detection in seismic signal analysis[J]. Neural Computing and Applications, 2017, 28(10): 3035-3045. ZHENG Jinde, CHENG Junsheng, and YANG Yu. Modified EEMD algorithm and its application[J]. Journal of Vibration and Shock, 2013, 32(21): 21-26. -
計量
- 文章訪問數(shù): 1761
- HTML全文瀏覽量: 293
- PDF下載量: 134
- 被引次數(shù): 0