基于感知深度神經(jīng)網(wǎng)絡(luò)的視覺跟蹤
doi: 10.11999/JEIT151449 cstr: 32379.14.JEIT151449
基金項目:
國家自然科學基金(61175029, 61473309),陜西省自然科學基金(2015JM6269,2015JM6269,2016JM6050)
Robust Visual Tracking via Perceptive Deep Neural Network
Funds:
The National Natural Science Foundation of China (61175029, 61473309), The Natural Science Foundation of Shaanxi Province (2015JM6269, 2015JM6269, 2016JM6050)
-
摘要: 視覺跟蹤系統(tǒng)中,高效的特征表達是決定跟蹤魯棒性的關(guān)鍵,而多線索融合是解決復雜跟蹤問題的有效手段。該文首先提出一種基于多網(wǎng)絡(luò)并行、自適應觸發(fā)的感知深度神經(jīng)網(wǎng)絡(luò);然后,建立一個基于深度學習的、多線索融合的分塊目標模型。目標分塊的實現(xiàn)成倍地減少了網(wǎng)絡(luò)輸入的維度,從而大幅降低了網(wǎng)絡(luò)訓練時的計算復雜度;在跟蹤過程中,模型能夠根據(jù)各子塊的置信度動態(tài)調(diào)整權(quán)重,提高對目標姿態(tài)變化、光照變化、遮擋等復雜情況的適應性。在大量的測試數(shù)據(jù)上進行了實驗,通過對跟蹤結(jié)果進行定性和定量分析表明,所提出算法具有很強的魯棒性,能夠比較穩(wěn)定地跟蹤目標。
-
關(guān)鍵詞:
- 視覺跟蹤 /
- 特征表達 /
- 深度學習 /
- 感知深度神經(jīng)網(wǎng)絡(luò)
Abstract: In a visual tracking system, the feature description plays the most important role. Multi-cue fusion is an effective way to solve the tracking problem under many complex conditions. Therefore, a perceptive deep neural network based on multi parallel networks which can be triggered adaptively is proposed. Then, using the multi-cue fusion, a new tracking method based on deep learning is established, in which the target can be adaptively fragmented. The fragment decreases the input dimension, thus reducing the computation complexity. During the tracking process, the model can dynamically adjust the weights of fragments according to the reliability of them, which is able to improve the flexibility of the tracker to deal with some complex circumstances, such as target posture change, light change and occluded by other objects. Qualitative and quantitative analysis on challenging benchmark video sequences show that the proposed tracking method is robust and can track the moving target robustly.-
Key words:
- Visual tracking /
- Feature description /
- Deep learning /
- Perceptive deep neural network
-
侯志強, 韓崇昭. 視覺跟蹤技術(shù)綜述[J]. 自動化學報, 2006, 32(4): 603-617. HOU Zhiqiang and HAN Chongzhao. A Survey of visual tracking[J]. Acta Automatica Sinica, 2006, 32(4): 603-617. WANG Naiyan, SHI Jianping, YEUNG Dityan, et al. Understanding and diagnosing visual tracking systems[C]. International Conference on Computer Vision, Santiago, Chile, 2015: 11-18. BABENKO B, YANG M, and BELONGIE S. Visual tracking with online multiple instance learning[C]. International Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 2009: 983-990. doi: 10.1109/CVPR.2009. 5206737. KALAL Z, MIKOLAJCZYK K, and MATAS J. Tracking learning detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(7): 1409-1422. doi: 10.1109/TPAMI.2011.239. HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification[C]. International Conference on Computer Vision, Santiago, Chile, 2015: 1026-1034. COURBARIAUX M, BENGIO Y, and DAVID J P. Binary Connect: training deep neural networks with binary weights during propagations[C]. Advances in Neural Information Processing Systems, Montral, Quebec, Canada, 2015: 3105-3113. SAINATH T N, VINYALS O, SENIOR A, et al. Convolutional, long short term memory, fully connected deep neural networks[C]. IEEE International Conference on Acoustics, Speech and Signal Processing, Brisbane, Australia, 2015: 4580-4584. doi: 10.1109/ICASSP.2015.7178838. PARKHI O M, VEDALDI A, and ZISSERMAN A. Deep face recognition[J]. Proceedings of the British Machine Vision, 2015, 1(3): 6. WANG Naiyan and YEUNG Dityan. Learning a deep compact image representation for visual tracking[C]. Advances in Neural Information Processing Systems, South Lake Tahoe, Nevada, USA, 2013: 809-817. 李寰宇, 畢篤彥, 楊源, 等. 基于深度特征表達與學習的視覺跟蹤算法研究[J]. 電子與信息學報, 2015, 37(9): 2033-2039. LI Huanyu, BI Duyan, YANG Yuan, et al. Research on visual tracking algorithm based on deep feature expression and learning[J]. Journal of Electronics Information Technology, 2015, 37(9): 2033-2039. doi: 10.11999/JEIT150031. RUSSAKOVSKY O, DENG J, SU H, et al. Imagenet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3): 211-252. doi: 10.1007/ s11263-015-0816-y. VINCENT P, LAROCHELLE H, LAJOIE I, et al. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion[J]. Journal of Machine Learning Research, 2010, 11(11): 3371-3408. HINTON G E and SALAKHUTDINOV R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786): 504-507. doi: 10.1126/science.1127647. ADAM A, RIVLIN E, and SHIMSHONI I. Robust fragments-based tracking using the integral histogram[C]. International Conference on Computer Vision and Pattern Recognition, New York, NY, USA, 2006: 798-805. doi: 10.1109/CVPR.2006.256. JULIER S J and UHLM J U. Unscented filtering and nonlinear estimation[J]. Proceedings of IEEE, 2004, 192(3): 401-422. doi: 10.1109/JPROC.2003.823141. YILMAZ A, JAVED O, and SHAH M. Object tracking: a survey[J]. ACM Computer Survey, 2006, 38(4): 1-45. NICKEL K and STIEFELHAGEN R. Dynamic integration of generalized cues for person tracking[C]. European Conference on Computer Vision, Marseille, France, 2008: 514-526. doi: 10.1007/978-3-540-88693-8_38. SPENGLER M and SCHIELE B. Towards robust multi-cue integration for visual tracking[J]. Machine Vision and Applications, 2003, 14(1): 50-58. doi: 10.1007/s00138-002- 0095-9. WU Yi, LIM Jongwoo, and YANG Minghsuan. Online object tracking: a benchmark[C]. International Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 2013: 2411-2418. ZHANG Kaihua, ZHANG Lei, and YANG Minghsuan. Real-time compressive tracking[C]. European Conference on Computer Vision, Florence, Italy, 2012: 866-879. doi: 10.1007/978-3-642-33712-3_62. SEVILLA-LARA L and LEARNED-MILLER E. Distribution fields for tracking[C]. International Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 2012: 1910-1917. doi: 10.1109/CVPR.2012.6247891. LI Hanxi, LI Yi, and PORIKLI Fatih. Deeptrack: learning discriminative feature representations by convolutional neural networks for visual tracking[C]. Proceedings of the British Machine Vision Conference, Nottingham, UK, 2014: 110-119. doi: 10.1109/TIP.2015.2510583. -
計量
- 文章訪問數(shù): 1863
- HTML全文瀏覽量: 123
- PDF下載量: 950
- 被引次數(shù): 0