基于幅度壓縮濾波的清濁音分類及基音估計
doi: 10.11999/JEIT150778 cstr: 32379.14.JEIT150778
-
2.
(同濟大學電子與信息工程學院 上海 201804) ②(湖州師范學院工學院 湖州 313000)
基金項目:
國家自然科學基金(61271248),湖州市自然科學基金(2015YZ04)
Voiced/Unvoiced Classification and Pitch Estimation Based on Amplitude Compression Filter
-
2.
(School of Electronics and Information, Tongji University, Shanghai 201804, China)
Funds:
The National Natural Science Foundation of China (61271248), The Natural Science Foundation of Huzhou City (2015YZ04)
-
摘要: 該文針對傳統(tǒng)算法在實環(huán)境(不同噪聲類型和信噪比)下容易發(fā)生清濁誤判和基音估計錯誤問題,提出一種基于幅度壓縮基音估計濾波(PEFAC)的清濁音分類及基音估計方法。首先,通過PEFAC削弱語音的低頻噪聲,提取出基音諧波;然后,采用基于對稱平均幅度和函數(shù)的脈沖序列加權(quán)算法(SIM)確定諧波數(shù)目;最后,利用動態(tài)規(guī)劃估計出基音,用基于3元素特征矢量的高斯混合模型對清濁音進行分類。仿真結(jié)果表明,在實環(huán)境下,所提方法能有效抑制清濁誤判及基音估計錯誤現(xiàn)象的發(fā)生,性能優(yōu)于傳統(tǒng)方法。
-
關(guān)鍵詞:
- 語音信號處理 /
- 基音 /
- 幅度壓縮基音估計濾波 /
- 對稱平均幅度和函數(shù) /
- 高斯混合模型 /
- 噪聲語音
Abstract: A method of voiced/unvoiced classification and pitch estimation based on Pitch Estimation Filter with Amplitude Compression (PEFAC) is proposed in this paper. The method first attenuates strong noise components at the?low frequencies based on PEFAC and extracts pitch harmonic from noisy speech in the log-frequency domain. Then, the harmonic number associated with the pitch harmonic is determined by Symmetric average magnitude sum function weighted Impulse-train Matching (SIM) scheme in time domain. A pitch tracking scheme using dynamic programming is applied to select the pitch candidates and a voiced speech probability is computed from the likelihood ratio of Gaussian Mixture Models (GMMs) classifiers based on 3-element feature vector. The simulated results show that the proposed method efficiently reduces voiced/unvoiced and pitch estimation error, and it is superior to some of the state-of-theart method in the real environment. -
RABINER L, CHENG M, ROSENBERG A E, et al. A comparative performance study of several pitch detection algorithms[J]. IEEE Transactions on Acoustics, Speech and Signal Processing, 1976, 24(5): 399-418. VEPREK P and SCORDILIS M S. Analysis, enhancement and evaluation of five pitch determination techniques[J]. Speech Communication, 2002, 37(3): 249-270. HAN Kun and Wang Deliang. Neural network based pitch tracking in very noisy speech[J]. IEEE/ACM Transactions on Audio, speech, and Language Processing, 2014, 22(12): 2158-2168. MOLINA E, TARDON L J, BARBANCHO A M, et al. SiPTH: Singing transcription based on hysteresis defined on the pitch-time curve[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, 23(2): 252-263. DUAN Zhiyao, HAN Jinyu, and PARDO B. Multi-pitch streaming of harmonic sound mixtures[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 22(1): 138-150. CHEN Yujui, WEI Chengwen, CHIANG Yifan, et al. Neuromorphic pitch based noise reduction for monosyllable hearing aid system application[J]. IEEE Transactions on Circuits and Systems, 2014, 61(2): 463-475. 王玥, 錢志鴻, 張營. 基于擴展譜相減的RCAF基音周期檢測算法[J]. 電子與信息學報, 2009, 31(5): 1161-1165. WANG Yue, QIAN Zhihong, and ZHANG Ying. RCAF pitch detection algorithm based on expanded spectral subtraction [J]. Journal of Electronics Information Technology, 2009, 31(5): 1161-1165. SHIMAMURA T and KOBAYASHI H. Weighted autocorrelation for pitch extraction of noisy speech[J]. IEEE Transactions on Speech and Audio Processing, 2001, 9(7): 727-730. 徐敬德, 常亮, 崔慧娟, 等. 基于頻域和時域結(jié)合的基音周期提取算法[J]. 清華大學學報, 2012, 52(3): 413-415. XU Jingde, CHANG Liang, CUI Huijuan, et al. A pitch period detection algorithm using time and frequency analyses[J]. Journal of Tsinghua University, 2012, 52(3): 413-415. SHAHNAZ C, ZHU W P, and AHMAD M O. Pitch estimation based on a harmonic sinusoidal autocorrelation model and a time-domain matching scheme[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(1): 322-335. HUANG F and LEE T. Pitch estimation in noisy speech using accumulated peak spectrum and sparse estimation technique[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(1): 99-109. GONZALEZ S and BROOKES M. PEFACA pitch estimation algorithm robust to high levels of noise[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2014, 22(2): 518-530. BYRNE D, DILLON H, TRAN K, et al. An international comparison of long term average speech spectra[J]. The Journal of the Acoustical Society of America, 1994, 96(4): 2108-2120. BROOKES M. VOICEBOX: A speech processing toolbox PLANTE F, MEYER G F, and AINSWORTH W A. A pitch extraction reference database[C]. 4th European Conference on Speech Communication and Technology, Madrid, 1995: 837-840. STEENEKEN H J and GEURTSEN F W. Description of the RSG-10 noise database[R]. Report IZF 1988-3 TNO, Soesterberg: Institute for Perception, 1988. International Telecommunication Union-TP.56. Objective measurement of active speech level[S]. Geneva, 1993. 張文耀, 許剛, 王裕國. 循環(huán)AMDF及其語音基音周期估計算法[J]. 電子學報, 2003, 31(6): 886-890. ZHANG Wenyao, XU Gang, and WANG Yuguo. Circular AMDF and pitch estimation based on it[J]. Acta Electronica Sinica, 2003, 31(6): 886-890. 韓明, 劉教民, 孟軍英, 等. 一種自適應(yīng)調(diào)整的混合高斯背景建模和目標檢測算法[J]. 電子與信息學報, 2014, 36(8): 2023-2027. doi: 10.3724/SP.J.1146.2013.01438. HAN Ming, LIU Jiaomin, MENG Junying, et al. A modeling and target detection algorithm based on adaptive adjustment??for mixture Gaussian background[J]. Journal of Electronics Information Technology, 2014, 36(8): 2023-2027. doi: 10.3724/SP.J.1146.2013.01438. TALKIN D. Speech Coding and Synthesis[M]. Elsevier Science, 1995, Chapter.14: 495-518. -
計量
- 文章訪問數(shù): 1862
- HTML全文瀏覽量: 134
- PDF下載量: 616
- 被引次數(shù): 0