用多頻帶能量分布檢測低信噪比聲音事件

李應(yīng); 吳靈菲

doi:10.11999/JEIT180180

用多頻帶能量分布檢測低信噪比聲音事件

doi: 10.11999/JEIT180180 cstr: 32379.14.JEIT180180

李應(yīng)^,,
吳靈菲

1.
福州大學(xué)數(shù)學(xué)與計算機科學(xué)學(xué)院 ??福州 ??350116
2.
網(wǎng)絡(luò)系統(tǒng)信息安全福建省高校重點實驗室 ??福州 ??350116

基金項目: 國家自然科學(xué)基金(61075022)，福建省自然科學(xué)基金(2018J01793)

詳細信息

作者簡介:
李應(yīng)：男，1964年生，教授，研究方向為信息安全、多媒體數(shù)據(jù)檢索

吳靈菲：女，1994年生，碩士生，研究方向為信息安全、模式識別

通訊作者:
李應(yīng)　 fj_liying@fzu.edu.cn

中圖分類號: TP391.42
計量
- 文章訪問數(shù): 2123
- HTML全文瀏覽量: 804
- PDF下載量: 42
- 被引次數(shù): 0
出版歷程
- 收稿日期: 2018-02-09
- 修回日期: 2018-07-09
- 網(wǎng)絡(luò)出版日期: 2018-07-26
- 刊出日期: 2018-12-01

Detection of Sound Event under Low SNR Using Multi-band Power Distribution

Ying LI^,,
Lingfei WU

1.
College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, China
2.
Fujian Province Key Laboratory of Information Security of Network Systems, Fuzhou University, Fuzhou 350116, China

Funds: The National Natural Science Foundation of China (61075022), The Natural Science Foundation of Fujian Province (2018J01793)

摘要

摘要: 該文針對低信噪比噪聲環(huán)境下的聲音事件檢測問題，提出基于多頻帶能量分布圖離散余弦變換的聲音事件檢測的方法。首先，將聲音數(shù)據(jù)轉(zhuǎn)化為gammatone頻譜，并計算其多頻帶能量分布；接著，對多頻帶能量分布圖進行8×8分塊與離散余弦變換；然后，對8×8的離散余弦變換系數(shù)進行Zigzag掃描，抽取離散余弦變換系數(shù)的主要系數(shù)作為聲音事件的特征；最后，利用隨機森林分類器對特征建模與檢測。實驗結(jié)果表明，在低信噪比及各種噪聲環(huán)境下，該文提出的方法具有良好的檢測效果。
- 聲音事件檢測 /
- 多頻帶能量分布 /
- 隨機森林 /
- 離散余弦變換
Abstract: As to the problem of sound event detection in low Signal-Noise-Ratio (SNR) noise environments, a method is proposed based on discrete cosine transform coefficients extracted from multi-band power distribution image. First, by using gammatone spectrogram analysis, sound signal is transformed into multi-band power distribution image. Next, 8×8 size blocking and discrete cosine transform are applied to analyze the multi-band power distribution image. Based on the main Zigzag coefficients which are scanned from the discrete cosine transform coefficients, features of sound event are constructed. Finally, features are modeled and detected through random forests classifier. The results show that the proposed method achieves a better detection performance in low SNR comparing to other methods.
- Sound event detection /
- Multi-band power distribution /
- Random forests /
- Discrete cosine transform

HTML全文

圖 1 譜圖特征用于非匹配條件的聲音事件分類

下載: 全尺寸圖片幻燈片

圖 2 基于MBPD圖的低信噪比聲音事件檢測

下載: 全尺寸圖片幻燈片

圖 3 茶隼叫聲的gammatone頻譜圖及MBPD

下載: 全尺寸圖片幻燈片

圖 4 圖像分塊及DCT系數(shù)

下載: 全尺寸圖片幻燈片

圖 5 不同Z值的檢測率

下載: 全尺寸圖片幻燈片

圖 6 MBPD-DCTZ特征在不同分類器下的檢測率

下載: 全尺寸圖片幻燈片

圖 7 風(fēng)聲環(huán)境下–10 dB茶隼叫聲、純凈茶隼叫聲以及風(fēng)聲的波形圖、gammatone頻譜圖和MBPD

下載: 全尺寸圖片幻燈片

表 1 MBPD-DCTZ特征的交叉驗證結(jié)果(%)

信噪比(dB)	噪聲環(huán)境
信噪比(dB)	流水	粉噪聲	風(fēng)聲	海浪	公路	雨聲	平均
–10	40.0±0.7	65.7±5.1	32.5±3.8	44.7±0.9	52.6±3.8	36.5±3.2	45.3±11.1
–5	86.1±3.4	91.1±1.7	87.0±3.2	82.9±1.9	91.2±2.1	84.7±2.5	87.2±3.1
0	91.7±1.9	91.8±1.9	92.3±1.9	91.6±1.4	92.01±2.2	91.5±1.9	91.8±0.3
5	91.9±1.9	92.2±1.9	92.1±2.3	92.2±1.8	92.3±2.1	92.0±1.9	92.1±0.1

下載: 導(dǎo)出CSV

表 3 不同特征對辦公室聲音事件的檢測率(%)

特征	辦公室聲音事件	粉噪聲信噪比(dB)
特征	辦公室聲音事件	5	0	–5
LBP	69.7±2.3	70.9±5.1	35.2±0.9	16.4±2.6
GLCM-SDH	47.3±5.4	44.2±7.5	45.5±5.4	38.8±4.8
HOG	70.3±5.2	40.6±4.8	33.9±3.1	32.1±2.3
MFCC	43.7±0.7	27.2±4.7	22.1±4.5	17.6±3.4
PNCC	47.2±1.9	34.3±2.0	28.1±2.3	22.1±1.8
MBPD-DCTZ	75.2±0.6	75.2±1.7	75.8±4.3	54.6±5.4

下載: 導(dǎo)出CSV

表 2 6種噪聲環(huán)境下不同特征對動物聲音事件的平均檢測率(%)

特征	信噪比(dB)
特征	5	0	–5	–10
LBP	64.3±14.3	16.6±10.5	2.8±0.8	2.4±0.9
GLCM-SDH	41.4±3.5	36.0±4.3	14.6±9.5	4.2±1.7
HOG	68.9±5.4	28.8±10.5	7.4±5.2	4.1±1.8
MFCC	17.5±4.8	9.5±2.5	4.7±0.7	3.0±0.8
PNCC	28.0±0.9	20.0±0.9	9.1±2.0	2.5±0.8
MBPD-DCTZ	92.1±0.1	91.8±0.3	87.2±3.1	45.3±11.1

下載: 導(dǎo)出CSV

表 4 6種噪聲環(huán)境下不同方法對動物聲音事件的平均檢測率(%)

方法	信噪比(dB)
方法	5	0	–5	–10
本文方法	92.1±0.1	91.8±0.3	87.2±3.1	45.3±11.1
MFCC-SVM^[22]	25.2±6.0	13.8±4.8	5.7±3.1	3.7±2.0
MP-SVM^[10]	30.0±2.5	16.4±4.0	8.2±2.4	4.6±0.9
SIF-SVM^[13]	61.4±8.5	40.3±12.1	18.9±13.4	9.7±7.7
SPD-KNN^[12]	87.9±1.8	82.7±3.9	45.4±22.1	9.9±8.8

下載: 導(dǎo)出CSV

表 5 不同方法對辦公室聲音事件的檢測率(%)

方法	辦公室聲音事件	粉噪聲信噪比(dB)
方法	辦公室聲音事件	5	0	–5
本文方法	75.2±0.9	75.2±1.7	75.8±4.3	54.6±5.4
MFCC-SVM^[22]	16.4±1.8	15.8±1.7	17.6±0.9	16.4±3.0
MP-SVM^[10]	62.7±4.2	45.4±2.1	26.0±0.9	14.0±1.4
SIF-SVM^[13]	75.2±2.3	40.6±6.2	31.5±8.2	25.5±1.5
SPD-KNN^[12]	36.4±13.6	28.5±4.8	25.5±5.4	21.8±5.4

下載: 導(dǎo)出CSV

參考文獻(26)

米建偉, 方曉莉, 仇原鷹. 非平穩(wěn)背景噪聲下聲音信號增強技術(shù)[J]. 儀器儀表學(xué)報, 2017, 38(1): 17–22 doi: 10.3969/j.issn.0254-3087.2017.01.003

MI Jianwei, FANG Xiaoli, and QIU Yuanying. Enhancement technology for the audio signal with nonstationary background noise[J]. Chinese Journal of Scientific Instrument, 2017, 38(1): 17–22 doi: 10.3969/j.issn.0254-3087.2017.01.003

汪家冬, 鄒采榮, 蔣本聰, 等. 基于數(shù)字助聽器聲音場景分類的噪聲抑制算法[J]. 數(shù)據(jù)采集與處理, 2017, 32(4): 825–830 doi: 10.16337/j.1004-9037.2017.04.021

WANG Jiadong, ZOU Cairong, JIANG Bencong, et al. Noise reduction algorithm based on acoustic scene classification in digital hearing aids[J]. Journal of Data Acquisition and Processing, 2017, 32(4): 825–830 doi: 10.16337/j.1004-9037.2017.04.021

FENG Zuren, ZHOU Qing, ZHANG Jun, et al. A target guided subband filter for acoustic event detection in noisy environments using wavelet packets[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing, 2015, 23(2): 361–372 doi: 10.1109/TASLP.2014.2381871

GRZESZICK R, PLINGE A, and FINK G A. Bag-of-features methods for acoustic event detection and classification[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing, 2017, 25(6): 1242–1252 doi: 10.1109/TASLP.2017.2690574

REN Jianfeng, JIANG Xudong, YUAN Junsong, et al. Sound-event classification using robust texture features for robot hearing[J]. IEEE Transactions on Multimedia, 2017, 19(3): 447–458 doi: 10.1109/TMM.2016.2618218

YE Jiaxing, KOBAYASHI T, and MURAKAWA M. Urban sound event classification based on local and global features aggregation[J]. Applied Acoustics, 2017, 117: 246–256 doi: 10.1016/j.apacoust.2016.08.002

CAKIR E, PARASCANDOLO G, HEITTOLA T, et al. Convolutional recurrent neural networks for polyphonic sound event detection[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing, 2017, 25(6): 1291–1303 doi: 10.1109/TASLP.2017.2690575

SHARAN R V and MOIR T J. Robust acoustic event classification using deep neural networks[J]. Information Sciences, 2017, 396: 24–32 doi: 10.1016/j.ins.2017.02.013

OZER I, OZER Z, and FINDIK O. Noise robust sound event classification with convolutional neural network[J]. Neurocomputing, 2018, 272: 505–512 doi: 10.1016/j.neucom.2017.07.021

WANG Jiaching, LIN Changhong, and CHEN Bowei. Gabor-based nonuniform scale-frequency map for environmental sound classification in home automation[J]. IEEE Transactions on Automation Science and Engineering, 2014, 11(2): 607–613 doi: 10.1109/TASE.2013.2285131

SHARMA A and KAUL S. Two-stage supervised learning-based method to detect screams and cries in urban environments[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing, 2016, 24(2): 290–299 doi: 10.1109/TASLP.2015.2506264

DENNIS J, TRAN H D, and CHNG E S. Image feature representation of the subband power distribution for robust sound event classification[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing, 2013, 21(2): 367–377 doi: 10.1109/TASL.2012.2226160

DENNIS J, TRAN H D, and LI Haizhou. Spectrogram image feature for sound event classification in mismatched conditions[J]. IEEE Signal Processing Letters, 2011, 18(2): 130–133 doi: 10.1109/LSP.2010.2100380

SLANEY M. An efficient implementation of the Patterson-Holdsworth auditory filter bank[R]. Apple Computer Technical Report, 1993.

PAPAKOSTAS G A, KOULOURIOTIS D E, and KARAKASIS E G. Efficient 2-D DCT Computation from An Image Representation Point of View[M]. London, UK, Intch Open, 2009: 21–34.

LAY J A and GUAN Ling. Image retrieval based on energy histograms of the low frequency DCT coefficients[C]. IEEE International Conference on Acoustic, Speech and Signal Processing, Arizona, USA, 1999: 3009–3012.

BREIMAN L. Random forests[J]. Machine Learning, 2001, 45(1): 5–32 doi: 10.1023/A:1010933404324

Universitat Pompeu Fabra. Repository of sound under the creative commons license, Freesound. org[OL]. http://www.freesound.org, 2012.5.14.

IEEE Signal Processing Society, Tampere University of Technology, Queen Mary University of London, et al. IEEE DCASE 2016 Challenge[OL]. http://www.cs.tut.fi/sgn/arg/dcase2016/, 2016.

CHANG Chihchung and LIN Chihjen. LIBSVM: A library for support vector machines[J]. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): 1–27 doi: 10.1145/1961189.1961199

COVER T and HART P. Nearest neighbor pattern classification[J]. IEEE Transactions on Information Theory, 1967, 13(1): 21–27 doi: 10.1109/TIT.1967.1053964

ZHENG Fang, ZHANG Guoliang, and SONG Zhanjiang. Comparison of different implementations of MFCC[J]. Journal of Computer Science and Technology, 2001, 16(6): 582–589 doi: 10.1007/BF02943243

KIM C and STERN R M. Feature extraction for robust speech recognition based on maximizing the sharpness of the power distribution and on power flooring[C]. IEEE International Conference on Acoustic, Speech and Signal Processing, Dallas, USA, 2010: 4574–4577.

魏靜明, 李應(yīng). 利用抗噪紋理特征的快速鳥鳴聲識別[J]. 電子學(xué)報, 2015, 43(1): 185–190 doi: 10.3969/j.issn.0372-2112.2015.01.029

WEI Jingming and LI Ying. Rapid bird sound recognition using anti-noise texture features[J]. Acta Electronica Sinica, 2015, 43(1): 185–190 doi: 10.3969/j.issn.0372-2112.2015.01.029

KOBAYASHI T and YE J. Acoustic feature extraction by statictics based local binary pattern for environmental sound classification[C]. IEEE International Conference on Acoustic, Speech and Signal Processing, Florence, Italy, 2014: 3052–3056.

RAKOTOMAMONJY A and GASSO G. Histogram of gradients of time-frequency representations for audio scene classification[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing, 2015, 23(1): 142–153 doi: 10.1109/TASLP.2014.2375575

相關(guān)文章

施引文獻

資源附件(0)

訪問統(tǒng)計