LGDNet：結合局部和全局特征的表格檢測網絡

盧迪; 袁璇

doi:10.11999/JEIT240428

LGDNet：結合局部和全局特征的表格檢測網絡

doi: 10.11999/JEIT240428 cstr: 32379.14.JEIT240428

盧迪^,,
袁璇

1.
哈爾濱理工大學測控技術與通信工程學院哈爾濱 150080
2.
哈爾濱理工大學模式識別與信息感知黑龍江省重點實驗室哈爾濱 150080

詳細信息

作者簡介:
盧迪：女，教授，博士，研究方向為數據融合、圖像處理等

袁璇：女，碩士生，研究方向為圖像處理、表格檢測等

通訊作者:
盧迪　ludizeng@hrbust.edu.cn

中圖分類號: TN911.73
計量
- 文章訪問數: 295
- HTML全文瀏覽量: 141
- PDF下載量: 48
- 被引次數: 0
出版歷程
- 收稿日期: 2024-05-30
- 修回日期: 2024-11-08
- 網絡出版日期: 2024-11-18
- 刊出日期: 2024-12-01

LGDNet: Table Detection Network Combining Local and Global Features

LU Di^,,
YUAN Xuan

1.
School of Measurement and Control Technology and Communication Engineering, Harbin University of Science and Technology, Harbin 150080, China
2.
Heilongjiang Province Key Laboratory of Pattern Recognition and Information Perception, Harbin University of Science and Technology, Harbin 150080, China

摘要

摘要: 在大數據時代，表格廣泛存在于各類文檔圖像中，進行表格檢測對于表格信息再利用具有重要意義。針對現有的基于卷積神經網絡的表格檢測算法存在感受野受限、依賴于預設的候選區(qū)域以及表格邊界定位不準確等問題，該文提出一種基于 DINO模型的表格檢測網絡。首先，設計一種圖像預處理方法，旨在增強表格的角點和線特征，以更好地區(qū)分表格與文本等其他文檔元素。其次，設計一種主干網絡SwTNet-50，通過在ResNet中引入Swin Transformer Blocks (STB)，有效地進行局部-全局特征信息的提取，提高模型的特征提取能力以及對表格邊界的檢測準確性。最后，為了彌補DINO模型在1對1匹配中編碼器特征學習不足問題，采用協(xié)同混合匹配訓練策略，提高編碼器的特征學習能力，提升模型檢測精度。與多種基于深度學習的表格檢測方法進行對比，該文模型在表格檢測數據集TNCR上優(yōu)于對比算法，在IoU閾值為0.5, 0.75和0.9時F1-Score分別達到98.2%, 97.4%和93.3%。在IIIT-AR-13K數據集上，IoU閾值為0.5時F1-Score為98.6%。
- 表格檢測 /
- 卷積神經網絡 /
- Transformer /
- 特征提取
Abstract: In the era of big data, table widely exists in various document images, and table detection is of great significance for the reuse of table information. In response to issues such as limited receptive field, reliance on predefined proposals, and inaccurate table boundary localization in existing table detection algorithms based on convolutional neural network, a table detection network based on DINO model is proposed in this paper. Firstly, an image preprocessing method is designed to enhance the corner and line features of table, enabling more precise table boundary localization and effective differentiation between table and other document elements like text. Secondly, a backbone network SwTNet-50 is designed, and Swin Transformer Blocks (STB) are introduced into ResNet to effectively combine local and global features, and the feature extraction ability of the model and the detection accuracy of table boundary are improved. Finally, to address the inadequacies in encoder feature learning in one-to-one matching and insufficient positive sample training in the DINO model, a collaborative hybrid assignments training strategy is adopted to improve the feature learning ability of the encoder and detection precision. Compared with various table detection methods based on deep learning, our model is better than other algorithms on the TNCR table detection dataset, with F1-Scores of 98.2%, 97.4%, and 93.3% for IoU thresholds of 0.5, 0.75, and 0.9, respectively. On the IIIT-AR-13K dataset, the F1-Score is 98.6% when the IoU threshold is 0.5.
- Table detection /
- Convolutional Neural Network (CNN) /
- Transformer /
- Feature extraction

HTML全文

圖 1 DINO模型網絡結構

下載: 全尺寸圖片幻燈片

圖 2 LGDNet結構

下載: 全尺寸圖片幻燈片

圖 3 文檔圖像預處理過程

下載: 全尺寸圖片幻燈片

圖 4 SwTNet-50主干網絡

下載: 全尺寸圖片幻燈片

圖 5 一對多匹配輔助分支

下載: 全尺寸圖片幻燈片

圖 6 TNCR數據集中5種類型的表格圖像

下載: 全尺寸圖片幻燈片

圖 7 Full lined類型表格檢測結果

下載: 全尺寸圖片幻燈片

圖 11 Partial lined and Merged cells類型表格檢測結果

下載: 全尺寸圖片幻燈片

圖 9 Partial lined類型表格檢測結果

下載: 全尺寸圖片幻燈片

圖 8 Merged cells類型表格檢測結果

下載: 全尺寸圖片幻燈片

圖 10 No lines類型表格檢測結果

下載: 全尺寸圖片幻燈片

表 1 輔助頭信息

輔助頭i	匹配方式A_i
輔助頭i	{pos}, {neg}生成規(guī)則	P_i生成規(guī)則	$B_i^{\left\{ {{\text{pos}}} \right\}}$生成規(guī)則
Faster R-CNN	{pos}：IoU(proposal, gt)>0.5 {neg}：IoU(proposal, gt)<0.5	{pos}：gt labels, offset(proposal, gt) {neg}：gt labels	positive proposals $\left( {{x_1}, {y_1}, {x_2}, {y_2}} \right)$
ATSS	{pos}：IoU(anchor, gt)>(mean+std) {neg}：IoU(anchor, gt)<(mean+std)	{pos}：gt labels, offset(anchor, gt), centerness {neg}：gt labels	positive anchors $\left( {{x_1}, {y_1}, {x_2}, {y_2}} \right)$

下載: 導出CSV

表 2 TNCR, IIIT-AR-13K數據集上的對比實驗結果(%)

數據集	網絡模型	F1-Score
數據集	網絡模型	IoU@0.5	IoU@0.75	IoU@0.9
TNCR	Cascade Mask R-CNN^[12]	93.1	92.1	86.6
	DiffusionDet^[20]	95.5	93.9	88.5
	Deformable DETR^[17]	94.5	93.7	89.3
	DINO^[21]	94.6	91.4	90.1
	Sparse R-CNN^[19]	95.2	94.8	90.9
	本文	98.2	97.4	93.3
IIIT-AR-13K	Faster R-CNN^[8]	93.7	–	–
	Mask R-CNN^[25]	97.1	–	–
	DINO^[21]	97.4	–	–
	本文	98.6	–	–

下載: 導出CSV

表 3 主干網絡對比實驗結果(%)

網絡模型	主干網絡	F1-Score
網絡模型	主干網絡	IoU@0.5	IoU@0.75	IoU@0.9
DINO^[21]	ResNet50	93.5	90.6	89.7
DINO^[21]	Swin Transformer	94.6	91.4	90.1
本文	SwTNet-50	95.8	93.6	91.1

下載: 導出CSV

表 4 消融實驗結果(%)

序號	網絡模型	F1-Score
序號	網絡模型	IoU@0.5	IoU@0.75	IoU@0.9
1	DINO^[21]	94.6	91.4	90.1
2	DINO+文檔圖像預處理(DINO_DIP)	95.2	92.0	90.5
3	DINO_DIP+SwTNet-50	96.8	94.2	91.7
4	DINO_DIP+一對多匹配輔助分支	97.5	96.7	92.8
5	LGDNet(DINO_DIP+SwTNet-50+一對多匹配輔助分支)	98.2	97.4	93.3

下載: 導出CSV

參考文獻(25)

[1]	高良才, 李一博, 都林, 等. 表格識別技術研究進展[J]. 中國圖象圖形學報, 2022, 27(6): 1898–1917. doi: 10.11834/jig.220152. GAO Liangcai, LI Yibo, DU Lin, et al. A survey on table recognition technology[J]. Journal of Image and Graphics, 2022, 27(6): 1898–1917. doi: 10.11834/jig.220152.
[2]	WATANABE T, LUO Qin, and SUGIE N. Structure recognition methods for various types of documents[J]. Machine Vision and Applications, 1993, 6(2/3): 163–176. doi: 10.1007/BF01211939.
[3]	GATOS B, DANATSAS D, PRATIKAKIS I, et al. Automatic table detection in document images[C]. The Third International Conference on Advances in Pattern Recognition, Bath, UK, 2005: 609–618. doi: 10.1007/11551188_67.
[4]	KASAR T, BARLAS P, ADAM S, et al. Learning to detect tables in scanned document images using line information[C]. 2013 12th International Conference on Document Analysis and Recognition, Washington, USA, 2013: 1185–1189. doi: 10.1109/ICDAR.2013.240.
[5]	ANH T, IN-SEOP N, and SOO-HYUNG K. A hybrid method for table detection from document image[C]. 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), Kuala Lumpur, Malaysia, 2015: 131–135. doi: 10.1109/ACPR.2015.7486480.
[6]	LEE K H, CHOY Y C, and CHO S B. Geometric structure analysis of document images: A knowledge-based approach[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(11): 1224–1240. doi: 10.1109/34.888708.
[7]	SCHREIBER S, AGNE S, WOLF I, et al. DeepDeSRT: Deep learning for detection and structure recognition of tables in document images[C]. 2017 14th IAPR International Conference on Document Analysis and Recognition, Kyoto, Japan, 2017: 1162–1167. doi: 10.1109/ICDAR.2017.192.
[8]	REN Shaoqing, HE Kaiming, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149. doi: 10.1109/TPAMI.2016.2577031.
[9]	ARIF S and SHAFAIT F. Table detection in document images using foreground and background features[C]. 2018 Digital Image Computing: Techniques and Applications (DICTA), Canberra, Australia, 2018: 1–8. doi: 10.1109/DICTA.2018.8615795.
[10]	SIDDIQUI S A, MALIK M I, AGNE S, et al. DeCNT: Deep deformable CNN for table detection[J]. IEEE Access, 2018, 6: 74151–74161. doi: 10.1109/ACCESS.2018.2880211.
[11]	SUN Ningning, ZHU Yuanping, and HU Xiaoming. Faster R-CNN based table detection combining corner locating[C]. 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia, 2019: 1314–1319. doi: 10.1109/ICDAR.2019.00212.
[12]	CAI Zhaowei and VASCONCELOS N. Cascade R-CNN: Delving into high quality object detection[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, 2018: 6154–6162. doi: 10.1109/CVPR.2018.00644.
[13]	PRASAD D, GADPAL A, KAPADNI K, et al. CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, USA, 2020: 2439–2447. doi: 10.1109/CVPRW50498.2020.00294.
[14]	AGARWAL M, MONDAL A, and JAWAHAR C V. CDeC-Net: Composite deformable cascade network for table detection in document images[C]. 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 2021: 9491–9498. doi: 10.1109/ICPR48806.2021.9411922.
[15]	HUANG Yilun, YAN Qinqin, LI Yibo, et al. A YOLO-based table detection method[C]. 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia, 2019: 813–818. doi: 10.1109/ICDAR.2019.00135.
[16]	SHEHZADI T, HASHMI K A, STRICKER D, et al. Towards end-to-end semi-supervised table detection with deformable transformer[C]. The 17th International Conference on Document Analysis and Recognition-ICDAR 2023, San José, USA, 2023: 51–76. doi: 10.1007/978-3-031-41679-8_4.
[17]	ZHU Xizhou, SU Weijie, LU Lewei, et al. Deformable DETR: Deformable transformers for end-to-end object detection[C]. The 9th International Conference on Learning Representations, Vienna, Austria, 2021.
[18]	XIAO Bin, SIMSEK M, KANTARCI B, et al. Table detection for visually rich document images[J]. Knowledge-Based Systems, 2023, 282: 111080. doi: 10.1016/j.knosys.2023.111080.
[19]	SUN Peize, ZHANG Rufeng, JIANG Yi, et al. Sparse R-CNN: End-to-end object detection with learnable proposals[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, 2021: 14449–14458. doi: 10.1109/CVPR46437.2021.01422.
[20]	CHEN Shoufa, SUN Peize, SONG Yibing, et al. DiffusionDet: Diffusion model for object detection[C]. 2023 IEEE/CVF International Conference on Computer Vision, Paris, France, 2023: 19773–19786. doi: 10.1109/ICCV51070.2023.01816.
[21]	ZHANG Hao, LI Feng, LIU Shilong, et al. DINO: DETR with improved DeNoising anchor boxes for end-to-end object detection[EB/OL]. https://arxiv.org/abs/2203.03605, 2022.
[22]	ZONG Zhuofan, SONG Guanglu, and LIU Yu. DETRs with collaborative hybrid assignments training[C]. 2023 IEEE/CVF International Conference on Computer Vision, Paris, France, 2023: 6748–6758. doi: 10.1109/ICCV51070.2023.00621.
[23]	ABDALLAH A, BERENDEYEV A, NURADIN I, et al. TNCR: Table net detection and classification dataset[J]. Neurocomputing, 2022, 473: 79–97. doi: 10.1016/j.neucom.2021.11.101.
[24]	MONDAL A, LIPPS P, and JAWAHAR C V. IIIT-AR-13K: A new dataset for graphical object detection in documents[C]. The 14th IAPR International Workshop, DAS 2020, Wuhan, China, 2020: 216-230. doi: 10.1007/978-3-030-57058-3_16.
[25]	HE Kaiming, GKIOXARI G, DOLLáR P, et al. Mask R-CNN[C]. Proceedings of 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 2980–2988. doi: 10.1109/ICCV.2017.322.