一级黄色片免费播放|中国黄色视频播放片|日本三级a|可以直接考播黄片影视免费一级毛片

高級(jí)搜索

留言板

尊敬的讀者、作者、審稿人, 關(guān)于本刊的投稿、審稿、編輯和出版的任何問(wèn)題, 您可以本頁(yè)添加留言。我們將盡快給您答復(fù)。謝謝您的支持!

姓名
郵箱
手機(jī)號(hào)碼
標(biāo)題
留言內(nèi)容
驗(yàn)證碼

殘差網(wǎng)絡(luò)在嬰幼兒哭聲識(shí)別中的應(yīng)用

謝湘 張立強(qiáng) 王晶

謝湘, 張立強(qiáng), 王晶. 殘差網(wǎng)絡(luò)在嬰幼兒哭聲識(shí)別中的應(yīng)用[J]. 電子與信息學(xué)報(bào), 2019, 41(1): 233-239. doi: 10.11999/JEIT180276
引用本文: 謝湘, 張立強(qiáng), 王晶. 殘差網(wǎng)絡(luò)在嬰幼兒哭聲識(shí)別中的應(yīng)用[J]. 電子與信息學(xué)報(bào), 2019, 41(1): 233-239. doi: 10.11999/JEIT180276
Xiang XIE, Liqiang ZHANG, Jing WANG. Application of Residual Network to Infant Crying Recognition[J]. Journal of Electronics & Information Technology, 2019, 41(1): 233-239. doi: 10.11999/JEIT180276
Citation: Xiang XIE, Liqiang ZHANG, Jing WANG. Application of Residual Network to Infant Crying Recognition[J]. Journal of Electronics & Information Technology, 2019, 41(1): 233-239. doi: 10.11999/JEIT180276

殘差網(wǎng)絡(luò)在嬰幼兒哭聲識(shí)別中的應(yīng)用

doi: 10.11999/JEIT180276 cstr: 32379.14.JEIT180276
基金項(xiàng)目: 國(guó)家自然科學(xué)基金(61473041, 11590772, 61571044)
詳細(xì)信息
    作者簡(jiǎn)介:

    謝湘:男,1976年生,副教授,研究方向?yàn)檎Z(yǔ)音識(shí)別

    張立強(qiáng):男,1995年生,碩士生,研究方向?yàn)檎Z(yǔ)音人格感知

    王晶:女,1980年生,副教授,研究方向?yàn)橐纛l信號(hào)處理

    通訊作者:

    謝湘 xiexiang@bit.edu.cn

  • 中圖分類號(hào): TP391.42

Application of Residual Network to Infant Crying Recognition

Funds: The National Natural Science Foundation of China (61473041, 11590772, 61571044)
  • 摘要:

    該文使用語(yǔ)譜圖結(jié)合殘差網(wǎng)絡(luò)的深度學(xué)習(xí)模型進(jìn)行嬰幼兒哭聲的識(shí)別,使用嬰幼兒哭聲與非哭聲樣本比例均衡的語(yǔ)料庫(kù),經(jīng)過(guò)五折交叉驗(yàn)證,與支持向量機(jī)(SVM),卷積神經(jīng)網(wǎng)絡(luò)(CNN),基于Gammatone濾波器的聽(tīng)覺(jué)譜殘差網(wǎng)絡(luò)(GT-Resnet)3種模型相比,基于語(yǔ)譜圖的殘差網(wǎng)絡(luò)取得了最優(yōu)結(jié)果,F1-score達(dá)到0.9965,滿足實(shí)時(shí)性要求,證明了語(yǔ)譜圖在嬰幼兒哭聲識(shí)別任務(wù)中能直觀地反映聲學(xué)特征,基于語(yǔ)譜圖的殘差網(wǎng)絡(luò)是解決嬰幼兒哭聲識(shí)別任務(wù)的優(yōu)秀方法。

  • 圖  1  嬰幼兒哭聲,成人說(shuō)話聲和鈴聲語(yǔ)譜圖對(duì)比

    圖  2  殘差模塊

    圖  3  CNN-5模型結(jié)構(gòu)

    圖  4  3種模型測(cè)試集F1-score對(duì)比

    圖  5  3種層數(shù)殘差網(wǎng)絡(luò)測(cè)試集F1-score對(duì)比

    圖  6  殘差網(wǎng)絡(luò)模型

    表  1  五折交叉驗(yàn)證數(shù)據(jù)集平均規(guī)模(條)

    嬰幼兒哭聲非哭聲總計(jì)
    訓(xùn)練集規(guī)模124311482391
    測(cè)試集規(guī)模310286596
    下載: 導(dǎo)出CSV

    表  2  SVM實(shí)驗(yàn)特征提取

    提取特征類型統(tǒng)計(jì)處理方法維數(shù)
    MFCC及其1階2階差分均值、方差72
    短時(shí)能量均值、方差2
    基音頻率均值、方差、最大值、最小值、極差5
    下載: 導(dǎo)出CSV

    表  3  SVM不同核函數(shù)性能比較

    核函數(shù)類型F1-score參數(shù)
    線性核函數(shù)0.8717c=0.68
    多項(xiàng)式核函數(shù)0.9316c=0.30, g=0.35, r=–0.20, d=3.00
    高斯核函數(shù)0.9458c=0.98, g=1.71
    Sigmod核函數(shù)0.8874c=5.00, g=0.04, r=1.80
    下載: 導(dǎo)出CSV

    表  4  不同層數(shù)CNN性能對(duì)比

    CNN模型輸入特征F1-score
    CNN-4-MEL40×128Mel語(yǔ)譜圖0.9184
    CNN-4-227227×227語(yǔ)譜圖0.9233
    CNN-4128×128語(yǔ)譜圖0.9229
    CNN-5-227227×227語(yǔ)譜圖0.9482
    CNN-5128×128語(yǔ)譜圖0.9489
    CNN-6128×128語(yǔ)譜圖0.9365
    CNN-7128×128語(yǔ)譜圖0.9398
    下載: 導(dǎo)出CSV

    表  5  模型性能對(duì)比

    模型網(wǎng)絡(luò)結(jié)構(gòu)輸入特征生成模型大小(MB)平均測(cè)試時(shí)間(s)F1-score
    SVM單層網(wǎng)絡(luò)統(tǒng)計(jì)特征0.70.0910+0.00010.9458
    CNN-54conv+1fc語(yǔ)譜圖100.1251+0.00930.9489
    Resnet153resblock+1fc語(yǔ)譜圖480.1251+0.02810.9836
    Resnet194resblock+1fc語(yǔ)譜圖870.1251+0.03150.9965
    Resnet276resblock+1fc語(yǔ)譜圖1710.1251+0.03550.9965
    GT-Resnet153resblock+1fc聽(tīng)覺(jué)譜480.1933+0.02180.9803
    GT-Resnet194resblock+1fc聽(tīng)覺(jué)譜870.1933+0.02370.9782
    GT-Resnet276resblock+1fc聽(tīng)覺(jué)譜1710.1933+0.02850.9719
    注:平均測(cè)試時(shí)間=特征提取時(shí)間+模型預(yù)測(cè)時(shí)間
    下載: 導(dǎo)出CSV
  • 于洪志, 劉思思. 三個(gè)月嬰兒啼哭聲的聲學(xué)分析[C]. 全國(guó)人機(jī)語(yǔ)音通訊學(xué)術(shù)會(huì)議, 西安, 2011: 1–4.

    YU Hongzhi and LIU Sisi. Crying sound learning analysis of three months baby[C]. National Conference on Man-Machine Speech Communication, Xi’an, China, 2011: 1–4.
    王之禹, 雷云珊. 嬰兒啼哭聲的聲學(xué)特征[C]. 中國(guó)聲學(xué)學(xué)會(huì)2006年全國(guó)聲學(xué)學(xué)術(shù)會(huì)議, 廈門, 2006: 389–390.

    WANG Zhiyu and LEI Yunshan. Acoustic characteristic of infant cries[C]. National Conference on Acoustics. Acoustical Society of China, Xiamen, China, 2006: 389–390.
    ABDULAZIZ Y and AHMAD S M S. Infant cry recognition system: A comparison of system performance based on mel frequency and linear prediction cepstral coefficients[C]. International Conference on Information Retrieval & Knowledge Management, Shah Alam, Malaysia, 2010: 260–263. doi: 10.1109/INFRKM.2010.5466907.
    COHEN R and LAVNER Y. Infant cry analysis and detection[C]. Electrical & Electronics Engineers in Israel, Eilat, Israel, 2012: 1–5.
    LAVNER Y, COHEN R, RUINSKIY D, et al. Baby cry detection in domestic environment using deep learning[C]. 2016 IEEE International Conference on the Science of Electrical Engineering (ICSEE), Eilat, Israel, 2016: 1–5. doi: 10.1109/EEEI.2012.6376996.
    TORRES R, BATTAGLINO D, and LEPAULOUX L. Baby cry sound detection: A comparison of hand crafted features and deep learning approach[C]. International Conference on Engineering Applications of Neural Networks. Springer, Cham, 2017: 168–179. doi: 10.1007/978-3-319-65172-9_15.
    CHANG Chuanyu and LI Jiajing. Application of deep learning for recognizing infant cries[C]. IEEE International Conference on Consumer Electronics, Nantou, China, 2016: 1–2. doi: 10.1109/ICCE-TW.2016.7520947.
    SHARAN R V and MOIR T J. Cochleagram image feature for improved robustness in sound recognition[C]. IEEE International Conference on Digital Signal Processing, Singapore, 2015: 441–444.
    PATTERSON R D, NIMMO-SMITH I, HOLDSWORTH J, et al. An efficient auditory filterbank based on the gammatone function[C]. Proceedings of the 1987 Speech-Group Meeting of the Institute of Acoustics on Auditory Modelling, RSRE, Malvern, 1987: 2–18.
    劉文舉, 聶帥, 梁山, 等. 基于深度學(xué)習(xí)語(yǔ)音分離技術(shù)的研究現(xiàn)狀與進(jìn)展[J]. 自動(dòng)化學(xué)報(bào), 2016, 42(6): 819–833. doi: 10.16383/j.aas.2016.c150734

    LIU Wenju, NIE Shuai, LIANG Shan, et al. Deep learning based speech separation technology and its developments[J]. Acta Automatica Sinica, 2016, 42(6): 819–833. doi: 10.16383/j.aas.2016.c150734
    MITTAL V K. Discriminating features of infant cry acoustic signal for automated detection of cause of crying[C]. International Symposium on Chinese Spoken Language Processing, Tianjin, China, 2017: 1–5. doi: 10.1109/ISCSLP.2016.7918391.
    RPSITA Y D and JUNAEDI H. Infant’s cry sound classification using Mel-Frequency Cepstrum Coefficients feature extraction and Backpropagation Neural Network[C]. International Conference on Science and Technology-Computer, Yogyakarta, Indonesia, 2017: 160–166. doi: 10.1109/ICSTC.2016.7877367.
    雷云珊. 嬰兒啼哭聲分析與模式分類[D]. [碩士論文], 山東科技大學(xué), 2006.

    LEI Yunshan. Analysis and pattern classification of infants’ cry[D]. [Master dissertation], Shandong University of Science and Technology, 2006.
    KRIZHEVAKY A, SUTSKEVER I, and HINTON G E. ImageNet classification with deep convolutional neural networks[C]. International Conference on Neural Information Processing Systems, Nevada, USA, 2012: 1097–1105.
    HE Kaiming, ZHANG Xianyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. Computer Vision and Pattern Recognition, Nevada, USA, 2016: 770–778. doi: 10.1109/CVPR.2016.90.
    GVERES. donateacry-corpus[OL]. https://github.com/gveres/donateacry-corpus, 2017.3.
    彭天強(qiáng), 栗芳. 基于深度卷積神經(jīng)網(wǎng)絡(luò)和二進(jìn)制哈希學(xué)習(xí)的圖像檢索方法[J]. 電子與信息學(xué)報(bào), 2016, 38(8): 2068–2075. doi: 10.11999/JEIT151346

    PENG Tianqiang and LI Fang. Image retrieval based on deep convolutional neural networks and binary hashing learning[J]. Journal of Electronics &Information Technology, 2016, 38(8): 2068–2075. doi: 10.11999/JEIT151346
    CHANG Chihchung and LIN Chihjen. LIBSVM: A library for support vector machines[J]. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): 1–27. doi: 10.1145/1961189.1961199
    徐利強(qiáng), 謝湘, 黃石磊, 等. 連續(xù)語(yǔ)音中的笑聲檢測(cè)研究與實(shí)現(xiàn)[C]. 全國(guó)聲學(xué)學(xué)術(shù)會(huì)議, 武漢, 2016: 581–584.

    XU Liqiang, XIE Xiang, HUANG Shilei, et al. Research and implementation of laughter detection in continuous speech[C]. National Conference on Acoustics. Acoustical Society of China, Wuhan, China, 2016: 581–584.
  • 加載中
圖(6) / 表(5)
計(jì)量
  • 文章訪問(wèn)數(shù):  3138
  • HTML全文瀏覽量:  1156
  • PDF下載量:  108
  • 被引次數(shù): 0
出版歷程
  • 收稿日期:  2018-03-23
  • 修回日期:  2018-09-04
  • 網(wǎng)絡(luò)出版日期:  2018-09-11
  • 刊出日期:  2019-01-01

目錄

    /

    返回文章
    返回