基于圖像偏移角和多分支卷積神經網(wǎng)絡的旋轉不變模型設計

張萌; 李響; 張經緯

doi:10.11999/JEIT240417

基于圖像偏移角和多分支卷積神經網(wǎng)絡的旋轉不變模型設計

doi: 10.11999/JEIT240417 cstr: 32379.14.JEIT240417

張萌¹,
李響^2, ,,
張經緯¹

1.
東南大學集成電路學院南京 211189
2.
蘭州大學物理科學與技術學院蘭州 730000

基金項目: 廣東省重點領域研發(fā)計劃(2021B1101270006)

詳細信息

作者簡介:
張萌：男，教授，博士生導師，研究方向為人工智能算法及硬件加速器協(xié)同設計、FPGA系統(tǒng)設計及應用等

李響：男，碩士生，研究方向為人工智能及圖像處理算法

張經緯：男，博士生，研究方向為FPGA智能計算，高層次綜合設計及人工智能編譯器

通訊作者:
李響　220220938961@lzu.edu.cn

中圖分類號: TN911.73; TP391
計量
- 文章訪問數(shù): 241
- HTML全文瀏覽量: 73
- PDF下載量: 28
- 被引次數(shù): 0
出版歷程
- 收稿日期: 2024-05-29
- 修回日期: 2024-11-08
- 網(wǎng)絡出版日期: 2024-11-18
- 刊出日期: 2024-12-01

Design of Rotation Invariant Model Based on Image Offset Angle and Multibranch Convolutional Neural Networks

1.
School of Integrated Circuits, Southeast University, Nanjing 211189, China
2.
School of Physical Sciences and Technology, Lanzhou University, Lanzhou 730000, China

Funds: The Key-Area Research and Development Program of Guangdong Province (2021B1101270006)

摘要

摘要: 卷積神經網(wǎng)絡(CNN)具有平移不變性，但缺乏旋轉不變性。近幾年，為卷積神經網(wǎng)絡進行旋轉編碼已成為解決這一技術痛點的主流方法，但這需要大量的參數(shù)和計算資源。鑒于圖像是計算機視覺的主要焦點，該文提出一種名為圖像偏移角和多分支卷積神經網(wǎng)絡(OAMC)的模型用于實現(xiàn)旋轉不變。首先檢測輸入圖像的偏移角，并根據(jù)偏移角反向旋轉圖像；將旋轉后的圖像輸入無旋轉編碼的多分支結構卷積神經網(wǎng)絡，優(yōu)化響應模塊，以輸出最佳分支作為模型的最終預測。OAMC模型在旋轉后的手寫數(shù)字數(shù)據(jù)集上以最少的8 k參數(shù)量實現(xiàn)了96.98%的最佳分類精度。與在遙感數(shù)據(jù)集上的現(xiàn)有研究相比，模型僅用前人模型的1/3的參數(shù)量就可將精度最高提高8%。
- 深度學習 /
- 旋轉圖像分類 /
- 偏移角 /
- 多分支卷積神經網(wǎng)絡
Abstract: Convolutional Neural Networks (CNNs) exhibit translation invariance but lack rotation invariance. In recent years, rotating encoding for CNNs becomes a mainstream approach to address this issue, but it requires a significant number of parameters and computational resources. Given that images are the primary focus of computer vision, a model called Offset Angle and Multibranch CNN (OAMC) is proposed to achieve rotation invariance. Firstly, the model detect the offset angle of the input image and rotate it back accordingly. Secondly, feed the rotated image into a multibranch CNN with no rotation encoding. Finally, Response module is used to output the optimal branch as the final prediction of the model. Notably, with a minimal parameter count of 8 k, the model achieves a best classification accuracy of 96.98% on the rotated handwritten numbers dataset. Furthermore, compared to previous research on remote sensing datasets, the model achieves up to 8% improvement in accuracy using only one-third of the parameters of existing models.
- Deep learning /
- Rotated image classification /
- Offset angle /
- Multibranch Convolutional Neural Networks (CNN)

HTML全文

圖 1 偏移角的檢測與旋轉模塊整體流程圖

下載: 全尺寸圖片幻燈片

圖 2 構建直角坐標系示意圖

下載: 全尺寸圖片幻燈片

圖 3 OAMC-B模型的整體結構

下載: 全尺寸圖片幻燈片

圖 4 36個旋轉子集的測試精度曲線

下載: 全尺寸圖片幻燈片

表 1 旋轉MNIST數(shù)據(jù)集測試精度

模型	參數(shù)量 (k)	精度 (%)
ORN-8 (Align)^[10]	969	83.76
ORN-8 (ORPooling)^[10]	397	83.33
RotEqNet^[5]	100	80.10
Spherical CNN^[15]	68	94.00
E(2)-CNN^[16]	2068	94.37
RIC-CNN^[1]	289	95.52
OAMC-1 (本文)	8	63.18
OAMC-2 (本文)	8	85.06
OAMC-4 (本文)	8	96.98
OAMC-8 (本文)	8	93.70

下載: 導出CSV

表 2 遙感數(shù)據(jù)集測試精度

模型	參數(shù)量 (k)	精度 (%)
模型	參數(shù)量 (k)	NWPU-10	MTARSI	AID
VGG16^[20]	3372	82.33	60.15	54.59
RIC-VGG16^[1]	3372	91.65	72.21	66.22
OAMC-4	981	92.91	75.69	74.31

下載: 導出CSV

參考文獻(20)

[1]	MO Hanlin and ZHAO Guoying. RIC-CNN: Rotation-invariant coordinate convolutional neural network[J]. Pattern Recognition, 2024, 146: 109994. doi: 10.1016/j.patcog.2023.109994.
[2]	ZHU Tianyu, FERENCZI B, PURKAIT P, et al. Knowledge combination to learn rotated detection without rotated annotation[C]. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 15518–15527. doi: 10.1109/CVPR52729.2023.01489.
[3]	HAN Jiaming, DING Jian, XUE Nan, et al. ReDet: A rotation-equivariant detector for aerial object detection[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 2785–2794. doi: 10.1109/CVPR46437.2021.00281.
[4]	LI Feiran, FUJIWARA K, OKURA F, et al. A closer look at rotation-invariant deep point cloud analysis[C]. 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 16198–16207. doi: 10.1109/ICCV48922.2021.01591.
[5]	MARCOS D, VOLPI M, KOMODAKIS N, et al. Rotation equivariant vector field networks[C]. 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 5058–5067. doi: 10.1109/ICCV.2017.540.
[6]	EDIXHOVEN T, LENGYEL A, and VAN GEMERT J C. Using and abusing equivariance[C]. Proceedings of 2023 IEEE/CVF International Conference on Computer Vision Workshops, Paris, France, 2023: 119–128. doi: 10.1109/ICCVW60793.2023.00019.
[7]	LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278–2324. doi: 10.1109/5.726791.
[8]	JADERBERG M, SIMONYAN K, ZISSERMAN A. Spatial transformer networks[C]. The 28th International Conference on Neural Information Processing Systems, Montreal, Canada, 2015: 2017–2025.
[9]	LAPTEV D, SAVINOV N, BUHMANN J M, et al. TI-POOLING: Transformation-invariant pooling for feature learning in convolutional neural networks[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 289–297. doi: 10.1109/CVPR.2016.38.
[10]	ZHOU Yanzhao, YE Qixiang, QIU Qiang, et al. Oriented response networks[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 4961–4970. doi: 10.1109/CVPR.2017.527.
[11]	WORRALL D E, GARBIN S J, TURMUKHAMBETOV D, et al. Harmonic networks: Deep translation and rotation equivariance[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 7168–7177. doi: 10.1109/CVPR.2017.758.
[12]	WEILER M, HAMPRECHT F A, and STORATH M. Learning steerable filters for rotation equivariant CNNs[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 849–858. doi: 10.1109/CVPR.2018.00095.
[13]	FIRAT H. Classification of microscopic peripheral blood cell images using multibranch lightweight CNN-based model[J]. Neural Computing and Applications, 2024, 36(4): 1599–1620. doi: 10.1007/s00521-023-09158-9.
[14]	WEI Xuan, SU Shixiang, WEI Yun, et al. Rotational convolution: Rethinking convolution for downside fisheye images[J]. IEEE Transactions on Image Processing, 2023, 32: 4355–4364. doi: 10.1109/TIP.2023.3298475.
[15]	COHEN T S, GEIGER M, KOEHLER J, et al. Spherical CNNs[C]. The Sixth International Conference on Learning Representations, Vancouver, Canada, 2018.
[16]	WEILER M and CESA G. General e(2)-equivariant steerable cnns[J]. Advances in Neural Information Processing Systems, 2019, 32.
[17]	CHENG Gong, HAN Junwei, ZHOU Peicheng, et al. Multi-class geospatial object detection and geographic image classification based on collection of part detectors[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2014, 98: 119–132. doi: 10.1016/j.isprsjprs.2014.10.002.
[18]	WU Zhize, WAN Shouhong, WANG Xiaofeng, et al. A benchmark data set for aircraft type recognition from remote sensing images[J]. Applied Soft Computing, 2020, 89: 106132. doi: 10.1016/j.asoc.2020.106132.
[19]	XIA Guisong, HU Jingwen, HU Fan, et al. AID: A benchmark data set for performance evaluation of aerial scene classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(7): 3965–3981. doi: 10.1109/TGRS.2017.2685945.
[20]	SIMONYAN K and ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[C]. The 3rd International Conference on Learning Representations, San Diego, USA, 2015.