一種表格框線檢測(cè)和字線分離算法
A frame line detection and removal algorithm for form document recognition
-
摘要: 該文提出了一種基于有向單連通鏈的表格框線檢測(cè)算法,能夠合理地利用單連通鏈邊沿的全局統(tǒng)計(jì)特性和單連通鏈之間的局部位置關(guān)系,精確地提取表格框線,具有抗傾斜,抗斷裂,抗字線交疊等優(yōu)點(diǎn)。在此基礎(chǔ)上,提出了一種能夠分離交疊字線的表格框線去除算法,并成功應(yīng)用于實(shí)際的表格識(shí)別系統(tǒng)中。
-
關(guān)鍵詞:
- 表格識(shí)別; 圖像分析; 直線檢測(cè); 字符識(shí)別
Abstract: A new frame line detection algorithm based on the structural image element-Directional Single-Connected Chain (DSCC) is proposed. Taking advantages of the global statistical property of the edges of the DSCCs, and their local mutual relations, the algorithm is able to accurately extract frame lines from scanned form images. It demonstrates the desired performance of insensitive to line slant, breaks as well as touches from character strokes inside the form cells. Based on this algorithm, a frame line removal approach is presented, by which the frame line can be removed without affecting the touched character strokes. -
Yuan Y. Tang et al., Automatic document processing: A survey, Pattern Recognition, 1996,29(12), 1931-1952.[2]J. Illingworth, J. Kittler, A survey of the Hough transform, Computer Vision, Graphics and ImageProcessing, 1988, 44(1), 87-116.[3]Mark C. K. Yang, et al., Hough transform modified by line connectivity and line thickness, IEEETrans. on PAMI, 1997, 19(8), 905-910.[4]Bin Yu, Anil K. Jain, A generic system for form dropout, IEEE Trans. on PAMI, 1996, 18(11),1127-1134.[5]劉今暉,印刷表格自動(dòng)輸入數(shù)據(jù)庫(kù)的研究與實(shí)現(xiàn),碩士學(xué)位論文,清華大學(xué),1992.[6]Liu Wenyin, Dov Dori, From raster to vectors: Extracting visual information from line drawings,Pattern Analysis and Applications, 1999, 2(2), 10-21.[7]Chun-Ta Ho, Ling-Hwei Chen, A high-speed algorithm for line detection, Pattern RecognitionLetters, 1996, 17(5), 467-473.[8]Jin-Yong Yoo, et al., Line removal and restoration of handwritten characters on the form documents, Proc. 4th International Conference on Document Analysis and Recognition, Ulm, Germany, 1997, 128-131. -
計(jì)量
- 文章訪問數(shù): 2899
- HTML全文瀏覽量: 149
- PDF下載量: 1809
- 被引次數(shù): 0