一级黄色片免费播放|中国黄色视频播放片|日本三级a|可以直接考播黄片影视免费一级毛片

高級搜索

留言板

尊敬的讀者、作者、審稿人, 關(guān)于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復(fù)。謝謝您的支持!

姓名
郵箱
手機(jī)號碼
標(biāo)題
留言內(nèi)容
驗證碼

利用可選擇多尺度圖卷積網(wǎng)絡(luò)的骨架行為識別

曹毅 李杰 葉培濤 王彥雯 呂賢海

曹毅, 李杰, 葉培濤, 王彥雯, 呂賢海. 利用可選擇多尺度圖卷積網(wǎng)絡(luò)的骨架行為識別[J]. 電子與信息學(xué)報, 2025, 47(3): 839-849. doi: 10.11999/JEIT240702
引用本文: 曹毅, 李杰, 葉培濤, 王彥雯, 呂賢海. 利用可選擇多尺度圖卷積網(wǎng)絡(luò)的骨架行為識別[J]. 電子與信息學(xué)報, 2025, 47(3): 839-849. doi: 10.11999/JEIT240702
CAO Yi, LI Jie, YE Peitao, WANG Yanwen, Lü Xianhai. Skeleton-based Action Recognition with Selective Multi-scale Graph Convolutional Network[J]. Journal of Electronics & Information Technology, 2025, 47(3): 839-849. doi: 10.11999/JEIT240702
Citation: CAO Yi, LI Jie, YE Peitao, WANG Yanwen, Lü Xianhai. Skeleton-based Action Recognition with Selective Multi-scale Graph Convolutional Network[J]. Journal of Electronics & Information Technology, 2025, 47(3): 839-849. doi: 10.11999/JEIT240702

利用可選擇多尺度圖卷積網(wǎng)絡(luò)的骨架行為識別

doi: 10.11999/JEIT240702 cstr: 32379.14.JEIT240702
基金項目: 國家自然科學(xué)基金(51375209),江蘇省“六大人才高峰”計劃(ZBZZ-012),高等學(xué)校學(xué)科創(chuàng)新引智計劃(B18027)
詳細(xì)信息
    作者簡介:

    曹毅:男,教授,博士,研究方向為機(jī)器人機(jī)構(gòu)學(xué)、深度學(xué)習(xí)

    李杰:男,碩士生,研究方向為深度學(xué)習(xí)、行為識別

    葉培濤:男,碩士生,研究方向為機(jī)器人控制系統(tǒng)、路徑規(guī)劃

    王彥雯:男,碩士生,研究方向為深度學(xué)習(xí)、聲紋識別

    呂賢海:男,碩士生,研究方向為機(jī)器人機(jī)構(gòu)學(xué)、行為識別

    通訊作者:

    曹毅 caoyi@jiangnan.edu.cn

  • 中圖分類號: TN911.73; TP391.41

Skeleton-based Action Recognition with Selective Multi-scale Graph Convolutional Network

Funds: The National Natural Science Foundation of China (51375209), The Six Talent Peaks Project in Jiangsu Province (ZBZZ-012), The Programme of Introducing Talents of Discipline to Universities (B18027)
  • 摘要: 針對目前骨架行為識別方法忽視骨架關(guān)節(jié)點多尺度依賴關(guān)系和無法合理利用卷積核進(jìn)行時間建模的問題,該文提出了一種可選擇多尺度圖卷積網(wǎng)絡(luò)(SMS-GCN)的行為識別模型。首先,介紹了人體骨架圖的構(gòu)建原理和通道拓?fù)浼?xì)化圖卷積網(wǎng)絡(luò)的結(jié)構(gòu);其次,構(gòu)建成對關(guān)節(jié)鄰接矩陣和多關(guān)節(jié)鄰接矩陣以生成多尺度通道拓?fù)浼?xì)化鄰接矩陣,并引入圖卷積網(wǎng)絡(luò),進(jìn)一步提出多尺度圖卷積(MS-GC)模塊,以期實現(xiàn)對骨架關(guān)節(jié)點的多尺度依賴關(guān)系的建模;然后,基于多尺度時序卷積和可選擇大核網(wǎng)絡(luò),提出可選擇多尺度時序卷積(SMS-TC)模塊,以期實現(xiàn)對有用的時間上下文特征的充分提取,同時結(jié)合MS-GC和SMS-TC模塊,進(jìn)而提出可選擇多尺度圖卷積網(wǎng)絡(luò)模型并在多支流數(shù)據(jù)輸入下進(jìn)行訓(xùn)練;最后,在NTU-RGB+D和NTU-RGB+D 120數(shù)據(jù)集上進(jìn)行大量實驗,實驗結(jié)果表明,該模型能夠捕獲更多的關(guān)節(jié)特征和學(xué)習(xí)有用的時間信息,具有優(yōu)異的準(zhǔn)確率和泛化能力。
  • 圖  1  多尺度圖卷積模塊結(jié)構(gòu)示意圖

    圖  2  可選擇多尺度時序卷積模塊結(jié)構(gòu)示意圖

    圖  3  SMS-GCN結(jié)構(gòu)示意圖

    表  1  不同卷積核尺寸的模型準(zhǔn)確率對比(%)

    模型Top-1Top-5
    SMS-GCN(k=3)95.0999.41
    SMS-GCN(k=5)95.0099.43
    SMS-GCN(k=7)94.9999.40
    SMS-GCN(k=9)94.9399.36
    下載: 導(dǎo)出CSV

    表  2  不同卷積核尺寸的準(zhǔn)確率對比

    模型(k1, k2)(d1, d2)Top-1(%)參數(shù)量(M)時間(s)模型(k1, k2)(d1, d2)Top-1(%)參數(shù)量(M)時間(s)
    1(1, 3)(1, 1)94.581.772779(3, 11)(1, 1)94.871.93277
    2(1, 5)(1, 1)94.791.8027510(5, 7)(1, 1)94.691.90278
    3(1, 7)(1, 1)94.921.8327711(5, 9)(1, 1)94.821.93277
    4(1, 9)(1, 1)94.781.8728212(5, 11)(1, 1)94.891.97277
    5(1, 11)(1, 1)95.041.9027613(7, 9)(1, 1)94.841.97270
    6(3, 5)(1, 1)94.921.8328714(7, 11)(1, 1)94.832.00285
    7(3, 7)(1, 1)94.741.8728115(9, 11)(1, 1)95.032.03272
    8(3, 9)(1, 1)94.911.90271
    下載: 導(dǎo)出CSV

    表  3  不同卷積核尺寸和膨脹系數(shù)的準(zhǔn)確率對比

    模型(k1, k2)(d1, d2)Top-1(%)參數(shù)量(M)時間(s)模型(k1, k2)(d1, d2)Top-1(%)參數(shù)量(M)時間(s)
    1(1, 1)(1, 2)92.951.747477(9, 9)(1, 3)95.072.00761
    2(3, 3)(1, 2)94.691.808868(9, 9)(1, 4)95.092.001000
    3(5, 5)(1, 2)94.891.878659(9, 9)(2, 3)94.982.001466
    4(7, 7)(1, 2)94.661.9385410(9, 9)(2, 4)94.962.001490
    5(9, 9)(1, 2)95.092.0073511(9, 9)(3, 4)94.852.001479
    6(11, 11)(1, 2)94.972.06767
    下載: 導(dǎo)出CSV

    表  4  不同結(jié)構(gòu)的模型準(zhǔn)確率對比

    模型參數(shù)量(M)Top-1(%)
    SMS-GCN2.0095.09
    SMS-GCN(無SMS-TC)3.7694.46
    SMS-GCN(無S)1.9694.97
    SMS-GCN(無GMP)2.0094.90
    SMS-GCN(無GAP)2.0094.93
    下載: 導(dǎo)出CSV

    表  5  添加不同模塊的模型準(zhǔn)確率對比(%)

    模型關(guān)節(jié)流骨骼流雙流
    CTR-GCN94.7494.7096.07
    CTR-GCN + MS-GC94.9194.9096.30
    CTR-GCN + SMS-TC94.8694.9096.29
    SMS-GCN95.0994.9696.52
    下載: 導(dǎo)出CSV

    表  6  NTU-RGB+D數(shù)據(jù)集下模型的準(zhǔn)確率對比(%)

    模型 CS CV 模型 CS CV
    CNC-LSTM[5] 83.3 91.8 3D-GCN[23] 89.4 93.3
    LAGA-Net[7] 87.1 93.2 ML-STGNet[12] 91.9 96.2
    ST-GCN[15] 81.5 88.3 MADT-GCN[19] 90.4 96.5
    2s-AGCN[9] 88.5 95.1 SMS-GCN(單流) 89.7 95.1
    CTR-GCN[10] 92.4 96.8 SMS-GCN(雙流) 91.9 96.5
    VN-GAN[22] 92.0 96.7 SMS-GCN(多流) 92.6 96.9
    下載: 導(dǎo)出CSV

    表  7  NTU-RGB+D 120數(shù)據(jù)集下模型的準(zhǔn)確率對比(%)

    模型 CSub CSet 模型 CSub CSet
    GCA-LSTM[6] 58.3 59.2 ML-STGNet[12] 88.6 90.0
    LAGA-Net[7] 81.0 82.2 MADT-GCN[19] 86.5 88.2
    ST-GCN[15] 70.7 73.2 VN-GAN[22] 87.6 89.4
    2s-AGCN[9] 82.9 84.9 SMS-GCN(單流) 85.3 86.6
    CTR-GCN[10] 88.9 90.6 SMS-GCN(雙流) 88.8 90.0
    STFE-GCN[11] 84.1 86.3 SMS-GCN(多流) 89.3 90.7
    下載: 導(dǎo)出CSV
  • [1] IODICE F, DE MOMI E, and AJOUDANI A. HRI30: An action recognition dataset for industrial human-robot interaction[C]. Proceedings of the 26th International Conference on Pattern Recognition, Montreal, Canada, 2022: 4941–4947. doi: 10.1109/ICPR56361.2022.9956300.
    [2] SARDARI S, SHARIFZADEH S, DANESHKHAH A, et al. Artificial intelligence for skeleton-based physical rehabilitation action evaluation: A systematic review[J]. Computers in Biology and Medicine, 2023, 158: 106835. doi: 10.1016/j.compbiomed.2023.106835.
    [3] SUN Zehua, KE Qiuhong, RAHMANI H, et al. Human action recognition from various data modalities: A review[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(3): 3200–3225. doi: 10.1109/TPAMI.2022.3183112.
    [4] 曹毅, 吳偉官, 張小勇, 等. 基于自校準(zhǔn)機(jī)制的時空采樣圖卷積行為識別模型[J]. 工程科學(xué)學(xué)報, 2024, 46(3): 480–490. doi: 10.13374/j.issn2095-9389.2022.12.25.002.

    CAO Yi, WU Weiguan, ZHANG Xiaoyong, et al. Action recognition model based on the spatiotemporal sampling graph convolutional network and self-calibration mechanism[J]. Chinese Journal of Engineering, 2024, 46(3): 480–490. doi: 10.13374/j.issn2095-9389.2022.12.25.002.
    [5] SHEN Xiangpei and DING Yanrui. Human skeleton representation for 3D action recognition based on complex network coding and LSTM[J]. Journal of Visual Communication and Image Representation, 2022, 82: 103386. doi: 10.1016/j.jvcir.2021.103386.
    [6] LIU Jun, WANG Gang, HU Ping, et al. Global context-aware attention LSTM networks for 3D action recognition[C]. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, 2017: 3671–3680. doi: 10.1109/CVPR.2017.391.
    [7] XIA Rongjie, LI Yanshan, and LUO Wenhan. LAGA-Net: Local-and-global attention network for skeleton based action recognition[J]. IEEE Transactions on Multimedia, 2022, 24: 2648–2661. doi: 10.1109/TMM.2021.3086758.
    [8] ZHANG Pengfei, LAN Cuiling, ZENG Wenjun, et al. Semantics-guided neural networks for efficient skeleton-based human action recognition[C]. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 2020: 1109–1118. doi: 10.1109/CVPR42600.2020.00119.
    [9] SHI Lei, ZHANG Yifan, CHENG Jian, et al. Two-stream adaptive graph convolutional networks for skeleton-based action recognition[C]. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, 2019: 12018–12027. doi: 10.1109/CVPR.2019.01230.
    [10] CHEN Yuxin, ZHANG Ziqi, YUAN Chunfeng, et al. Channel-wise topology refinement graph convolution for skeleton-based action recognition[C]. Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada, 2021: 13339–13348. doi: 10.1109/ICCV48922.2021.01311.
    [11] 曹毅, 吳偉官, 李平, 等. 基于時空特征增強(qiáng)圖卷積網(wǎng)絡(luò)的骨架行為識別[J]. 電子與信息學(xué)報, 2023, 45(8): 3022–3031. doi: 10.11999/JEIT220749.

    CAO Yi, WU Weiguan, LI Ping, et al. Skeleton action recognition based on spatio-temporal feature enhanced graph convolutional network[J]. Journal of Electronics & Information Technology, 2023, 45(8): 3022–3031. doi: 10.11999/JEIT220749.
    [12] ZHU Yisheng, SHUAI Hui, LIU Guangcan, et al. Multilevel spatial-temporal excited graph network for skeleton-based action recognition[J]. IEEE Transactions on Image Processing, 2023, 32: 496–508. doi: 10.1109/TIP.2022.3230249.
    [13] ZHOU Huanyu, LIU Qingjie, and WANG Yunhong. Learning discriminative representations for skeleton based action recognition[C]. Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 2023: 10608–10617. doi: 10.1109/CVPR52729.2023.01022.
    [14] WANG Kaixuan, DENG Hongmin, and ZHU Qilin. Lightweight channel-topology based adaptive graph convolutional network for skeleton-based action recognition[J]. Neurocomputing, 2023, 560: 126830. doi: 10.1016/j.neucom.2023.126830.
    [15] YAN Sijie, XIONG Yuanjun, and LIN Dahua. Spatial temporal graph convolutional networks for skeleton-based action recognition[C]. Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, USA, 2018: 7444–7452. doi: 10.1609/aaai.v32i1.12328.
    [16] GEDAMU K, JI Yanli, GAO Lingling, et al. Relation-mining self-attention network for skeleton-based human action recognition[J]. Pattern Recognition, 2023, 139: 109455. doi: 10.1016/j.patcog.2023.109455.
    [17] LIU Ziyu, ZHANG Hongwen, CHEN Zhenghao, et al. Disentangling and unifying graph convolutions for skeleton-based action recognition[C]. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 2020: 140–149. doi: 10.1109/CVPR42600.2020.00022.
    [18] LI Yuxuan, HOU Qibin, ZHENG Zhaohui, et al. Large selective kernel network for remote sensing object detection[C]. Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2023: 16748–16759. doi: 10.1109/ICCV51070.2023.01540.
    [19] XIA Yu, GAO Qingyuan, WU Weiguan, et al. Skeleton-based action recognition based on multidimensional adaptive dynamic temporal graph convolutional network[J]. Engineering Applications of Artificial Intelligence, 2024, 127: 107210. doi: 10.1016/j.engappai.2023.107210.
    [20] AMIR S, LIU Jun, NG T T, et al. NTU RGB+D: A large scale dataset for 3D human activity analysis[C]. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA: IEEE, 2016: 1010–1019. doi: 10.1109/CVPR.2016.115.
    [21] LIU Jun, SHAHROUDY A, PEREZ M, et al. NTU RGB+D 120: A large-scale benchmark for 3D human activity understanding[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(10): 2684–2701. doi: 10.1109/TPAMI.2019.2916873.
    [22] PAN Qingzhe, ZHAO Zhifu, XIE Xuemei, et al. View-normalized and subject-independent skeleton generation for action recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(12): 7398–7412. doi: 10.1109/TCSVT.2022.3219864.
    [23] 曹毅, 劉晨, 盛永健, 等. 基于三維圖卷積與注意力增強(qiáng)的行為識別模型[J]. 電子與信息學(xué)報, 2021, 43(7): 2071–2078. doi: 10.11999/JEIT200448.

    CAO Yi, LIU Chen, SHENG Yongjian, et al. Action recognition model based on 3D graph convolution and attention enhanced[J]. Journal of Electronics & Information Technology, 2021, 43(7): 2071–2078. doi: 10.11999/JEIT200448.
  • 加載中
圖(3) / 表(7)
計量
  • 文章訪問數(shù):  263
  • HTML全文瀏覽量:  146
  • PDF下載量:  61
  • 被引次數(shù): 0
出版歷程
  • 收稿日期:  2024-08-12
  • 修回日期:  2025-02-17
  • 網(wǎng)絡(luò)出版日期:  2025-02-24
  • 刊出日期:  2025-03-01

目錄

    /

    返回文章
    返回