一级黄色片免费播放|中国黄色视频播放片|日本三级a|可以直接考播黄片影视免费一级毛片

高級搜索

留言板

尊敬的讀者、作者、審稿人, 關于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復。謝謝您的支持!

姓名
郵箱
手機號碼
標題
留言內容
驗證碼

LGDNet:結合局部和全局特征的表格檢測網絡

盧迪 袁璇

盧迪, 袁璇. LGDNet:結合局部和全局特征的表格檢測網絡[J]. 電子與信息學報, 2024, 46(12): 4553-4562. doi: 10.11999/JEIT240428
引用本文: 盧迪, 袁璇. LGDNet:結合局部和全局特征的表格檢測網絡[J]. 電子與信息學報, 2024, 46(12): 4553-4562. doi: 10.11999/JEIT240428
LU Di, YUAN Xuan. LGDNet: Table Detection Network Combining Local and Global Features[J]. Journal of Electronics & Information Technology, 2024, 46(12): 4553-4562. doi: 10.11999/JEIT240428
Citation: LU Di, YUAN Xuan. LGDNet: Table Detection Network Combining Local and Global Features[J]. Journal of Electronics & Information Technology, 2024, 46(12): 4553-4562. doi: 10.11999/JEIT240428

LGDNet:結合局部和全局特征的表格檢測網絡

doi: 10.11999/JEIT240428 cstr: 32379.14.JEIT240428
詳細信息
    作者簡介:

    盧迪:女,教授,博士,研究方向為數據融合、圖像處理等

    袁璇:女,碩士生,研究方向為圖像處理、表格檢測等

    通訊作者:

    盧迪 ludizeng@hrbust.edu.cn

  • 中圖分類號: TN911.73

LGDNet: Table Detection Network Combining Local and Global Features

  • 摘要: 在大數據時代,表格廣泛存在于各類文檔圖像中,進行表格檢測對于表格信息再利用具有重要意義。針對現有的基于卷積神經網絡的表格檢測算法存在感受野受限、依賴于預設的候選區(qū)域以及表格邊界定位不準確等問題,該文提出一種基于 DINO模型的表格檢測網絡。首先,設計一種圖像預處理方法,旨在增強表格的角點和線特征,以更好地區(qū)分表格與文本等其他文檔元素。其次,設計一種主干網絡SwTNet-50,通過在ResNet中引入Swin Transformer Blocks (STB),有效地進行局部-全局特征信息的提取,提高模型的特征提取能力以及對表格邊界的檢測準確性。最后,為了彌補DINO模型在1對1匹配中編碼器特征學習不足問題,采用協(xié)同混合匹配訓練策略,提高編碼器的特征學習能力,提升模型檢測精度。與多種基于深度學習的表格檢測方法進行對比,該文模型在表格檢測數據集TNCR上優(yōu)于對比算法,在IoU閾值為0.5, 0.75和0.9時F1-Score分別達到98.2%, 97.4%和93.3%。在IIIT-AR-13K數據集上,IoU閾值為0.5時F1-Score為98.6%。
  • 圖  1  DINO模型網絡結構

    圖  2  LGDNet結構

    圖  3  文檔圖像預處理過程

    圖  4  SwTNet-50主干網絡

    圖  5  一對多匹配輔助分支

    圖  6  TNCR數據集中5種類型的表格圖像

    圖  7  Full lined類型表格檢測結果

    圖  11  Partial lined and Merged cells類型表格檢測結果

    圖  9  Partial lined類型表格檢測結果

    圖  8  Merged cells類型表格檢測結果

    圖  10  No lines類型表格檢測結果

    表  1  輔助頭信息

    輔助頭i 匹配方式Ai
    {pos}, {neg}生成規(guī)則 Pi生成規(guī)則 $B_i^{\left\{ {{\text{pos}}} \right\}}$生成規(guī)則
    Faster R-CNN {pos}:IoU(proposal, gt)>0.5
    {neg}:IoU(proposal, gt)<0.5
    {pos}:gt labels, offset(proposal, gt)
    {neg}:gt labels
    positive proposals
    $\left( {{x_1}, {y_1}, {x_2}, {y_2}} \right)$
    ATSS {pos}:IoU(anchor, gt)>(mean+std)
    {neg}:IoU(anchor, gt)<(mean+std)
    {pos}:gt labels, offset(anchor, gt), centerness
    {neg}:gt labels
    positive anchors
    $\left( {{x_1}, {y_1}, {x_2}, {y_2}} \right)$
    下載: 導出CSV

    表  2  TNCR, IIIT-AR-13K數據集上的對比實驗結果(%)

    數據集 網絡模型 F1-Score
    IoU@0.5 IoU@0.75 IoU@0.9
    TNCR Cascade Mask R-CNN[12] 93.1 92.1 86.6
    DiffusionDet[20] 95.5 93.9 88.5
    Deformable DETR[17] 94.5 93.7 89.3
    DINO[21] 94.6 91.4 90.1
    Sparse R-CNN[19] 95.2 94.8 90.9
    本文 98.2 97.4 93.3
    IIIT-AR-13K Faster R-CNN[8] 93.7
    Mask R-CNN[25] 97.1
    DINO[21] 97.4
    本文 98.6
    下載: 導出CSV

    表  3  主干網絡對比實驗結果(%)

    網絡模型主干網絡F1-Score
    IoU@0.5IoU@0.75IoU@0.9
    DINO[21]ResNet5093.590.689.7
    Swin Transformer94.691.490.1
    本文SwTNet-5095.893.691.1
    下載: 導出CSV

    表  4  消融實驗結果(%)

    序號網絡模型F1-Score
    IoU@0.5IoU@0.75IoU@0.9
    1DINO[21]94.691.490.1
    2DINO+文檔圖像預處理(DINO_DIP)95.292.090.5
    3DINO_DIP+SwTNet-5096.894.291.7
    4DINO_DIP+一對多匹配輔助分支97.596.792.8
    5LGDNet(DINO_DIP+SwTNet-50+一對多匹配輔助分支)98.297.493.3
    下載: 導出CSV
  • [1] 高良才, 李一博, 都林, 等. 表格識別技術研究進展[J]. 中國圖象圖形學報, 2022, 27(6): 1898–1917. doi: 10.11834/jig.220152.

    GAO Liangcai, LI Yibo, DU Lin, et al. A survey on table recognition technology[J]. Journal of Image and Graphics, 2022, 27(6): 1898–1917. doi: 10.11834/jig.220152.
    [2] WATANABE T, LUO Qin, and SUGIE N. Structure recognition methods for various types of documents[J]. Machine Vision and Applications, 1993, 6(2/3): 163–176. doi: 10.1007/BF01211939.
    [3] GATOS B, DANATSAS D, PRATIKAKIS I, et al. Automatic table detection in document images[C]. The Third International Conference on Advances in Pattern Recognition, Bath, UK, 2005: 609–618. doi: 10.1007/11551188_67.
    [4] KASAR T, BARLAS P, ADAM S, et al. Learning to detect tables in scanned document images using line information[C]. 2013 12th International Conference on Document Analysis and Recognition, Washington, USA, 2013: 1185–1189. doi: 10.1109/ICDAR.2013.240.
    [5] ANH T, IN-SEOP N, and SOO-HYUNG K. A hybrid method for table detection from document image[C]. 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), Kuala Lumpur, Malaysia, 2015: 131–135. doi: 10.1109/ACPR.2015.7486480.
    [6] LEE K H, CHOY Y C, and CHO S B. Geometric structure analysis of document images: A knowledge-based approach[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(11): 1224–1240. doi: 10.1109/34.888708.
    [7] SCHREIBER S, AGNE S, WOLF I, et al. DeepDeSRT: Deep learning for detection and structure recognition of tables in document images[C]. 2017 14th IAPR International Conference on Document Analysis and Recognition, Kyoto, Japan, 2017: 1162–1167. doi: 10.1109/ICDAR.2017.192.
    [8] REN Shaoqing, HE Kaiming, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149. doi: 10.1109/TPAMI.2016.2577031.
    [9] ARIF S and SHAFAIT F. Table detection in document images using foreground and background features[C]. 2018 Digital Image Computing: Techniques and Applications (DICTA), Canberra, Australia, 2018: 1–8. doi: 10.1109/DICTA.2018.8615795.
    [10] SIDDIQUI S A, MALIK M I, AGNE S, et al. DeCNT: Deep deformable CNN for table detection[J]. IEEE Access, 2018, 6: 74151–74161. doi: 10.1109/ACCESS.2018.2880211.
    [11] SUN Ningning, ZHU Yuanping, and HU Xiaoming. Faster R-CNN based table detection combining corner locating[C]. 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia, 2019: 1314–1319. doi: 10.1109/ICDAR.2019.00212.
    [12] CAI Zhaowei and VASCONCELOS N. Cascade R-CNN: Delving into high quality object detection[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, 2018: 6154–6162. doi: 10.1109/CVPR.2018.00644.
    [13] PRASAD D, GADPAL A, KAPADNI K, et al. CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, USA, 2020: 2439–2447. doi: 10.1109/CVPRW50498.2020.00294.
    [14] AGARWAL M, MONDAL A, and JAWAHAR C V. CDeC-Net: Composite deformable cascade network for table detection in document images[C]. 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 2021: 9491–9498. doi: 10.1109/ICPR48806.2021.9411922.
    [15] HUANG Yilun, YAN Qinqin, LI Yibo, et al. A YOLO-based table detection method[C]. 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia, 2019: 813–818. doi: 10.1109/ICDAR.2019.00135.
    [16] SHEHZADI T, HASHMI K A, STRICKER D, et al. Towards end-to-end semi-supervised table detection with deformable transformer[C]. The 17th International Conference on Document Analysis and Recognition-ICDAR 2023, San José, USA, 2023: 51–76. doi: 10.1007/978-3-031-41679-8_4.
    [17] ZHU Xizhou, SU Weijie, LU Lewei, et al. Deformable DETR: Deformable transformers for end-to-end object detection[C]. The 9th International Conference on Learning Representations, Vienna, Austria, 2021.
    [18] XIAO Bin, SIMSEK M, KANTARCI B, et al. Table detection for visually rich document images[J]. Knowledge-Based Systems, 2023, 282: 111080. doi: 10.1016/j.knosys.2023.111080.
    [19] SUN Peize, ZHANG Rufeng, JIANG Yi, et al. Sparse R-CNN: End-to-end object detection with learnable proposals[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, 2021: 14449–14458. doi: 10.1109/CVPR46437.2021.01422.
    [20] CHEN Shoufa, SUN Peize, SONG Yibing, et al. DiffusionDet: Diffusion model for object detection[C]. 2023 IEEE/CVF International Conference on Computer Vision, Paris, France, 2023: 19773–19786. doi: 10.1109/ICCV51070.2023.01816.
    [21] ZHANG Hao, LI Feng, LIU Shilong, et al. DINO: DETR with improved DeNoising anchor boxes for end-to-end object detection[EB/OL]. https://arxiv.org/abs/2203.03605, 2022.
    [22] ZONG Zhuofan, SONG Guanglu, and LIU Yu. DETRs with collaborative hybrid assignments training[C]. 2023 IEEE/CVF International Conference on Computer Vision, Paris, France, 2023: 6748–6758. doi: 10.1109/ICCV51070.2023.00621.
    [23] ABDALLAH A, BERENDEYEV A, NURADIN I, et al. TNCR: Table net detection and classification dataset[J]. Neurocomputing, 2022, 473: 79–97. doi: 10.1016/j.neucom.2021.11.101.
    [24] MONDAL A, LIPPS P, and JAWAHAR C V. IIIT-AR-13K: A new dataset for graphical object detection in documents[C]. The 14th IAPR International Workshop, DAS 2020, Wuhan, China, 2020: 216-230. doi: 10.1007/978-3-030-57058-3_16.
    [25] HE Kaiming, GKIOXARI G, DOLLáR P, et al. Mask R-CNN[C]. Proceedings of 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 2980–2988. doi: 10.1109/ICCV.2017.322.
  • 加載中
圖(11) / 表(4)
計量
  • 文章訪問數:  295
  • HTML全文瀏覽量:  141
  • PDF下載量:  48
  • 被引次數: 0
出版歷程
  • 收稿日期:  2024-05-30
  • 修回日期:  2024-11-08
  • 網絡出版日期:  2024-11-18
  • 刊出日期:  2024-12-01

目錄

    /

    返回文章
    返回