基于深度學(xué)習(xí)的YOLO目標(biāo)檢測(cè)綜述

邵延華; 張鐸; 楚紅雨; 張曉強(qiáng); 饒?jiān)撇?/>
	<meta name=

doi:10.11999/JEIT210790

基于深度學(xué)習(xí)的YOLO目標(biāo)檢測(cè)綜述

doi: 10.11999/JEIT210790 cstr: 32379.14.JEIT210790

1.
西南科技大學(xué)信息工程學(xué)院綿陽(yáng) 621010
2.
電子科技大學(xué) 成都 610054

基金項(xiàng)目: 國(guó)家自然科學(xué)基金(61601382)，四川省科技計(jì)劃(2019YJ0325, 2020YFG0148, 2021YFG0314)

詳細(xì)信息

作者簡(jiǎn)介:
邵延華：男，講師，研究方向?yàn)橛?jì)算機(jī)視覺(jué)

張鐸：男，碩士生，研究方向?yàn)橛?jì)算機(jī)視覺(jué)

楚紅雨：男，副研究員，研究方向?yàn)闄C(jī)器人技術(shù)

張曉強(qiáng)：男，講師，研究方向?yàn)楹铣煽讖匠上窈陀?jì)算機(jī)視覺(jué)

饒?jiān)撇ǎ耗?，副教授，研究方向?yàn)樘摂M現(xiàn)實(shí)、互聯(lián)網(wǎng)和計(jì)算機(jī)視覺(jué)

通訊作者:
邵延華　syh@cqu.edu.cn

中圖分類號(hào): TN911.73
計(jì)量
- 文章訪問(wèn)數(shù): 13873
- HTML全文瀏覽量: 5991
- PDF下載量: 2394
- 被引次數(shù): 0
出版歷程
- 收稿日期: 2021-08-06
- 修回日期: 2022-01-22
- 錄用日期: 2022-02-16
- 網(wǎng)絡(luò)出版日期: 2022-02-19
- 刊出日期: 2022-10-19

A Review of YOLO Object Detection Based on Deep Learning

1.
School of Information Engineering, Southwest University of Science and Technology, Mianyang 621010, China
2.
University of Electronic Science & Technology of China, Chengdu 610054, China

Funds: The National Natural Science Foundation of China (61601382), Sichuan Provincial Science and Technology Project (2019YJ0325, 2020YFG0148, 2021YFG0314)

摘要

摘要: 目標(biāo)檢測(cè)是計(jì)算機(jī)視覺(jué)領(lǐng)域的一個(gè)基礎(chǔ)任務(wù)和研究熱點(diǎn)。YOLO將目標(biāo)檢測(cè)概括為一個(gè)回歸問(wèn)題，實(shí)現(xiàn)端到端的訓(xùn)練和檢測(cè)，由于其良好的速度-精度平衡，近幾年一直處于目標(biāo)檢測(cè)領(lǐng)域的領(lǐng)先地位，被成功地研究、改進(jìn)和應(yīng)用到眾多不同領(lǐng)域。該文對(duì)YOLO系列算法及其重要改進(jìn)、應(yīng)用進(jìn)行了詳細(xì)調(diào)研。首先，系統(tǒng)地梳理了YOLO家族及重要改進(jìn)，包含YOLOv1-v4, YOLOv5, Scaled-YOLOv4, YOLOR和最新的YOLOX。然后，對(duì)YOLO中重要的基礎(chǔ)網(wǎng)絡(luò)，損失函數(shù)進(jìn)行了詳細(xì)的分析和總結(jié)。其次，依據(jù)不同的改進(jìn)思路或應(yīng)用場(chǎng)景對(duì)YOLO算法進(jìn)行了系統(tǒng)的分類歸納。例如，注意力機(jī)制、3D、航拍場(chǎng)景、邊緣計(jì)算等。最后，總結(jié)了YOLO的特點(diǎn)，并結(jié)合最新的文獻(xiàn)分析可能的改進(jìn)思路和研究趨勢(shì)。
- 目標(biāo)檢測(cè) /
- YOLO /
- 深度學(xué)習(xí) /
- 卷積神經(jīng)網(wǎng)絡(luò)
Abstract: Object detection is one of the basic tasks and research hotspots in the field of computer vision. The YOLO (You Only Look Once) frames object detection is a regression problem to implement end-to-end training and detection. YOLO becomes the leading object detector due to its good speed-accuracy balance, which has been successfully studied, improved, and applied to many different fields. YOLO series and its important improvements and applications are investigated in detail. Firstly, the YOLO family and important improvements are systematically summarized, including YOLOv1-v4, YOLOv5, Scaled-YOLOv4, YOLOR, and the latest YOLOX. Then, important backbone and loss functions in YOLO are analyzed and summarized in detail. Next, the application of YOLO is systematically classified and summarized according to different improvement ideas or scenarios, such as attention mechanisms, three-dimensional scenes, aerial scenes, edge computing, etc. Finally, the characteristics of the YOLO series are summarized and the possible improvement ideas and research trends are analyzed in combination with the latest literature.
- Object detection /
- YOLO /
- Deep learning /
- Convolutional Neural Network (CNN)

HTML全文

圖 1 YOLO檢測(cè)模型的發(fā)展歷程

下載: 全尺寸圖片幻燈片

圖 2 YOLOv1的網(wǎng)絡(luò)結(jié)構(gòu)

下載: 全尺寸圖片幻燈片

圖 3 具有尺寸先驗(yàn)和位置預(yù)測(cè)的邊界框

下載: 全尺寸圖片幻燈片

圖 4 Darknet-53與CSPDarknet-53

下載: 全尺寸圖片幻燈片

圖 5 VisDrone2019數(shù)據(jù)集示例^[37]

下載: 全尺寸圖片幻燈片

圖 6 Kaggle小麥檢測(cè)數(shù)據(jù)集與PRCV比賽數(shù)據(jù)集示例

下載: 全尺寸圖片幻燈片

表 1 YOLO系列在VOC2012的檢測(cè)結(jié)果

檢測(cè)框架	mAP(%)	fps	GPU
YOLO^[8]	57.9	–	TitanX
YOLOv3 416^[12]	79.3	39	1080Ti
SPP-YOLO 416^[39]	77.5	65.2	1080Ti
DC-SPP-YOLO 416^[39]	78.4	56.3	1080Ti
GC-YOLOv3 544^[31]	83.7	31	1080Ti

下載: 導(dǎo)出CSV

表 2 各類YOLO算法在COCO test2017上的表現(xiàn)

檢測(cè)框架	主干網(wǎng)絡(luò)	尺寸	fps	AP	AP₅₀	AP₇₅	AP_S	AP_M	AP_L	GPU
YOLOv3^[12], arXiv2018	Darknet-53	416	35	31.0	55.3	32.3	15.2	33.2	42.8	Maxwell GPU
YOLOv3-tiny^[12], arXiv2018	Darknet Ref	416	330	–	33.1	–	–	–	–	GTX 1080Ti
GC-YOLOv3^[31], MDPI2020	Darknet 53	416	28	–	55.5	–	–	–	–	GTX 1080Ti
YOLOv4-CSP^[13], arXiv2020	CSPDarknet-53	640	70	47.5	66.2	51.7	28.2	51.2	59.8	Volta GPU
YOLOv5-S^[14]	Modified CSP v5	640	156.3	36.7	55.4	–	–	–	–	Volta GPU
YOLOv5-X^[14]	Modified CSP v5	640	82.6	50.4	68.8	–	–	–	–	Volta GPU
PP-YOLOv2^[40], arXiv2021	ResNet50-vd-dcn^[28]	640	68.9	49.5	68.2	54.4	30.7	52.9	61.2	Volta GPU
YOLOR-P6^[9], arXiv2021	–	1280	49	52.6	70.6	57.6	34.7	56.6	64.2	Volta GPU
YOLOX-X^[10], arXiv2021	Modified CSP v5	640	57.8	51.2	69.6	55.7	31.2	56.1	66.1	Volta GPU

下載: 導(dǎo)出CSV

參考文獻(xiàn)(64)

[1]	LIU Li, OUYANG Wanli, WANG Xiaogang, et al. Deep learning for generic object detection: A survey[J]. International Journal of Computer Vision, 2020, 128(2): 261–318. doi: 10.1007/s11263-019-01247-4
[2]	ZOU Zhengxia, SHI Zhenwei, GUO Yuhong, et al. Object detection in 20 years: A survey[J]. arXiv preprint arXiv: 1905.05055, 2019.
[3]	DALAL N and TRIGGS B. Histograms of oriented gradients for human detection[C]. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, USA, 2005: 886–893.
[4]	KRIZHEVSKY A, SUTSKEVER I, and HINTON G E. ImageNet classification with deep convolutional neural networks[C]. The 25th International Conference on Neural Information Processing Systems, Lake Tahoe, USA, 2012: 1097–1105.
[5]	LECUN Y, BENGIO Y, and HINTON G. Deep learning[J]. Nature, 2015, 521(7553): 436–444. doi: 10.1038/nature14539
[6]	JIAO Licheng, ZHANG Fan, LIU Fang, et al. A survey of deep learning-based object detection[J]. IEEE Access, 2019, 7: 128837–128868. doi: 10.1109/access.2019.2939201
[7]	WU Xiongwei, SAHOO D, and HOI S C H. Recent advances in deep learning for object detection[J]. Neurocomputing, 2020, 396: 39–64. doi: 10.1016/j.neucom.2020.01.085
[8]	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 779–788.
[9]	WANG C Y, YEH I H, and LIAO H Y M. You only learn one representation: Unified network for multiple tasks[J]. arXiv preprint arXiv: 2105.04206, 2021.
[10]	GE Zheng, LIU Songtao, WANG Feng, et al. YOLOX: Exceeding YOLO series in 2021[J]. arXiv preprint arXiv: 2107.08430, 2021.
[11]	REDMON J and FARHADI A. YOLO9000: Better, faster, stronger[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 6517–6525.
[12]	REDMON J and FARHADI A. YOLOv3: An incremental improvement[J]. arXiv preprint arXiv: 1804.02767, 2018.
[13]	BOCHKOVSKIY A, WANG C Y, and LIAO H Y M. YOLOv4: Optimal speed and accuracy of object detection[J]. arXiv preprint arXiv: 2004.10934, 2020.
[14]	JOCHER G, STOKEN A, BOROVEC J, et al. Ultralytics/YOLOv5: V3.1 - bug fixes and performance improvements[EB/OL].https://doi.org/10.5281/zenodo.4154370, 2020.
[15]	WANG C Y, BOCHKOVSKIY A, and LIAO H Y M. Scaled-YOLOv4: Scaling cross stage partial network[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 13024–13033.
[16]	LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: Common objects in context[C]. 13th European Conference on Computer Vision, Zurich, Switzerland, 2014: 740–755.
[17]	羅會(huì)蘭, 陳鴻坤. 基于深度學(xué)習(xí)的目標(biāo)檢測(cè)研究綜述[J]. 電子學(xué)報(bào), 2020, 48(6): 1230–1239. doi: 10.3969/j.issn.0372-2112.2020.06.026 LUO Huilan and CHEN Hongkun. Survey of object detection based on deep learning[J]. Acta Electronica Sinica, 2020, 48(6): 1230–1239. doi: 10.3969/j.issn.0372-2112.2020.06.026
[18]	SZEGEDY C, LIU Wei, JIA Yangqing, et al. Going deeper with convolutions[C]. 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 1–9.
[19]	EVERINGHAM M, ESLAMI S M A, VAN GOOL L, et al. The PASCAL visual object classes challenge: A retrospective[J]. International Journal of Computer Vision, 2015, 111(1): 98–136. doi: 10.1007/s11263-014-0733-5
[20]	HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770–778.
[21]	WANG C Y, LIAO H Y M, WU Y H, et al. CSPNet: A new backbone that can enhance learning capability of CNN[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, USA, 2020: 1571–1580.
[22]	MISRA D. Mish: A self regularized non-monotonic activation function[J]. arXiv preprint arXiv: 1908.08681, 2019.
[23]	LIU Shu, QI Lu, QIN Haifang, et al. Path aggregation network for instance segmentation[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 8759–8768.
[24]	LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 936–944.
[25]	GHIASI G, LIN T Y, and LE Q V. NAS-FPN: Learning scalable feature pyramid architecture for object detection[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 7029–7038.
[26]	ELFWING S, UCHIBE E, and DOYA K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning[J]. Neural Networks, 2018, 107: 3–11. doi: 10.1016/j.neunet.2017.12.012
[27]	HOWARD A, SANDLER M, CHEN Bo, et al. Searching for MobileNetV3[C]. 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 2019: 1314–1324.
[28]	MA Ningning, ZHANG Xiangyu, ZHENG Haitao, et al. ShuffleNet V2: Practical guidelines for efficient CNN architecture design[C]. 2018 15th European Conference on Computer Vision, Munich, Germany, 2018: 122–138.
[29]	李成躍, 姚劍敏, 林志賢, 等. 基于改進(jìn)YOLO輕量化網(wǎng)絡(luò)的目標(biāo)檢測(cè)方法[J]. 激光與光電子學(xué)進(jìn)展, 2020, 57(14): 141003. doi: 10.3788/LOP57.141003 LI Chengyue, YAO Jianmin, LIN Zhixian, et al. Object detection method based on improved YOLO lightweight network[J]. Laser &Optoelectronics Progress, 2020, 57(14): 141003. doi: 10.3788/LOP57.141003
[30]	HU Jie, SHEN Li, and SUN Gang. Squeeze-and-excitation networks[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 7132–7141.
[31]	YANG Yang and DENG Hongmin. GC-YOLOv3: You only look once with global context block[J]. Electronics, 2020, 9(8): 1235. doi: 10.3390/electronics9081235
[32]	WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional block attention module[C]. 2018 15th European Conference on Computer Vision, Munich, Germany, 2018: 3–19.
[33]	ZHENG Zhaohui, WANG Ping, LIU Wei, et al. Distance-IoU loss: Faster and better learning for bounding box regression[C]. The 34th 2020 AAAI Conference on Artificial Intelligence, New York, USA, 2020: 12993–13000.
[34]	REZATOFIGHI H, TSOI N, GWAK J Y, et al. Generalized intersection over union: A metric and a loss for bounding box regression[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 658–666.
[35]	BODLA N, SINGH B, CHELLAPPA R, et al. Soft-NMS--improving object detection with one line of code[C]. 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 5562–5570.
[36]	CHEN Zhiming, CHEN Kean, LIN Weiyao, et al. PIoU loss: Towards accurate oriented object detection in complex environments[C]. 16th European Conference on Computer Vision, Glasgow, UK, 2020: 195–211.
[37]	DU Dawei, ZHU Pengfei, WEN Longyin, et al. VisDrone-DET2019: The vision meets drone object detection in image challenge results[C]. 2019 IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Korea (South), 2019: 213–226.
[38]	University of Saskatchewan. Kaggle competition: Global wheat detection[EB/OL]. https://www.kaggle.com/c/global-wheat-detection, 2020.
[39]	HUANG Zhanchao, WANG Jianlin, FU Xuesong, et al. DC-SPP-YOLO: Dense connection and spatial pyramid pooling based YOLO for object detection[J]. Information Sciences, 2020, 522: 241–258. doi: 10.1016/j.ins.2020.02.067
[40]	HUANG Xin, WANG Xinxin, LV Wenyu, et al. PP-YOLOv2: A practical object detector[J]. arXiv preprint arXiv: 2104.10419, 2021.
[41]	DING Jian, XUE Nan, XIA Guisong, et al. Object detection in aerial images: A large-scale benchmark and challenges[J]. arXiv preprint arXiv: 2102.12219, 2021.
[42]	TEKIN B, SINHA S N, and FUA P. Real-time seamless single shot 6D object pose prediction[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 292–301.
[43]	SIMON M, AMENDE K, KRAUS A, et al. Complexer-YOLO: Real-time 3D object detection and tracking on semantic point clouds[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, USA, 2019: 1190–1199.
[44]	TAKAHASHI M, JI Y, UMEDA K, et al. Expandable YOLO: 3D object detection from RGB-D images[C]. 2020 21st International Conference on Research and Education in Mechatronics (REM), Cracow, Poland, 2020: 1–5.
[45]	DING Caiwen, WANG Shuo, LIU Ning, et al. REQ-YOLO: A resource-aware, efficient quantization framework for object detection on FPGAs[C]. 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Seaside, USA, 2019: 33–42.
[46]	LEE Y, LEE C, LEE H J, et al. Fast detection of objects using a YOLOv3 network for a vending machine[C]. 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), Hsinchu, China, 2019: 132–136.
[47]	AZIMI S M. ShuffleDet: Real-time vehicle detection network in on-board embedded UAV imagery[C]. 2018 European Conference on Computer Vision Workshops, Munich, Germany, 2019: 88–99.
[48]	TIJTGAT N, VAN RANST W, VOLCKAERT B, et al. Embedded real-time object detection for a UAV warning system[C]. 2017 IEEE International Conference on Computer Vision Workshops, Venice, Italy, 2017: 2110–2118.
[49]	ZHANG Pengyi, ZHONG Yunxin, and LI Xiaoqiong. SlimYOLOv3: Narrower, faster and better for real-time UAV applications[C]. 2019 IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Korea (South), 2019: 37–45.
[50]	HENDRY and CHEN R C. Automatic license plate recognition via sliding-window darknet-YOLO deep learning[J]. Image and Vision Computing, 2019, 87: 47–56. doi: 10.1016/j.imavis.2019.04.007
[51]	TU Renwei, ZHU Zhongjie, BAI Yongqiang, et al. Improved YOLO v3 network-based object detection for blind zones of heavy trucks[J]. Journal of Electronic Imaging, 2020, 29(5): 053002. doi: 10.1117/1.JEI.29.5.053002
[52]	YANG Shuo, ZHANG Junxing, BO Chunjuan, et al. Fast vehicle logo detection in complex scenes[J]. Optics & Laser Technology, 2019, 110: 196–201. doi: 10.1016/j.optlastec.2018.08.007
[53]	YANG Fan, YANG Deming, HE Zhiming, et al. Automobile fine-grained detection algorithm based on multi-improved YOLOv3 in smart streetlights[J]. Algorithms, 2020, 13(5): 114. doi: 10.3390/a13050114
[54]	LI Min, ZHANG Zhijie, LEI Liping, et al. Agricultural greenhouses detection in high-resolution satellite images based on convolutional neural networks: Comparison of faster R-CNN, YOLO v3 and SSD[J]. Sensors, 2020, 20(17): 4938. doi: 10.3390/s20174938
[55]	WU Dihua, LV Shuaichao, JIANG Mei, et al. Using channel pruning-based YOLO v4 deep learning algorithm for the real-time and accurate detection of apple flowers in natural environments[J]. Computers and Electronics in Agriculture, 2020, 178: 105742. doi: 10.1016/j.compag.2020.105742
[56]	XU Zhifeng, JIA Ruisheng, SUN Hongmei, et al. Light-YOLOv3: Fast method for detecting green mangoes in complex scenes using picking robots[J]. Applied Intelligence, 2020, 50(12): 4670–4687. doi: 10.1007/s10489-020-01818-w
[57]	SHARIF M, AMIN J, SIDDIQA A, et al. Recognition of different types of leukocytes using YOLOv2 and optimized bag-of-features[J]. IEEE Access, 2020, 8: 167448–167459. doi: 10.1109/access.2020.3021660
[58]	ZHUANG Zhemin, LIU Guobao, DING Wanli, et al. Cardiac VFM visualization and analysis based on YOLO deep learning model and modified 2D continuity equation[J]. Computerized Medical Imaging and Graphics, 2020, 82: 101732. doi: 10.1016/j.compmedimag.2020.101732
[59]	KYRKOU C. YOLOpeds: Efficient real-time single-shot pedestrian detection for smart camera applications[J]. IET Computer Vision, 2020, 14(7): 417–425. doi: 10.1049/iet-cvi.2019.0897
[60]	趙斌, 王春平, 付強(qiáng). 顯著性背景感知的多尺度紅外行人檢測(cè)方法[J]. 電子與信息學(xué)報(bào), 2020, 42(10): 2524–2532. doi: 10.11999/JEIT190761 ZHAO Bin, WANG Chunping, and FU Qiang. Multi-scale pedestrian detection in infrared images with salient background-awareness[J]. Journal of Electronics &Information Technology, 2020, 42(10): 2524–2532. doi: 10.11999/JEIT190761
[61]	KRI?TO M, IVASIC-KOS M, and POBAR M. Thermal object detection in difficult weather conditions using YOLO[J]. IEEE Access, 2020, 8: 125459–125476. doi: 10.1109/access.2020.3007481
[62]	LIU Peng, SONG Changlin, LI Junmin, et al. Detection of transmission line against external force damage based on improved YOLOv3[J]. International Journal of Robotics and Automation, 2020, 35(6): 460–468.
[63]	XIE Yiqun, CAI Jiannan, BHOJWANI R, et al. A locally-constrained YOLO framework for detecting small and densely-distributed building footprints[J]. International Journal of Geographical Information Science, 2020, 34(4): 777–801. doi: 10.1080/13658816.2019.1624761
[64]	LUO Yanyang, SHAO Yanhua, CHU Hongyu, et al. CNN-based blade tip vortex region detection in flow field[C]. SPIE 11373, Eleventh International Conference on Graphics and Image Processing (ICGIP 2019), Hangzhou, China, 2020: 113730P.