基于深度學(xué)習(xí)的手語識別綜述

張淑軍; 張群; 李輝

doi:10.11999/JEIT190416

基于深度學(xué)習(xí)的手語識別綜述

doi: 10.11999/JEIT190416 cstr: 32379.14.JEIT190416

青島科技大學(xué)信息科學(xué)技術(shù)學(xué)院青島 266061

基金項(xiàng)目: 國家自然科學(xué)基金(61702295, 61672305)，山東省重點(diǎn)研發(fā)計(jì)劃項(xiàng)目(2017GGX10127)

詳細(xì)信息

作者簡介:
張淑軍：女，1980年生，副教授，研究方向?yàn)橛?jì)算機(jī)視覺

張群：女，1994年生，碩士生，研究方向?yàn)橛?jì)算機(jī)視覺

李輝：男，1984年生，副教授，研究方向?yàn)橛?jì)算機(jī)視覺

通訊作者:
張淑軍　lindazsj@163.com

中圖分類號: TP391
計(jì)量
- 文章訪問數(shù): 15816
- HTML全文瀏覽量: 6823
- PDF下載量: 1354
- 被引次數(shù): 0
出版歷程
- 收稿日期: 2019-06-06
- 修回日期: 2019-11-20
- 網(wǎng)絡(luò)出版日期: 2020-01-18
- 刊出日期: 2020-06-04

Review of Sign Language Recognition Based on Deep Learning

College of Information Science and Technology, Qingdao University of Science and Technology, Qingdao 266061, China

Funds: The National Natural Science Foundation of China (61702295, 61672305), The Key Research & Development Plan Project of Shandong Province (2017GGX10127)

摘要

摘要:
手語識別涉及計(jì)算機(jī)視覺、模式識別、人機(jī)交互等領(lǐng)域，具有重要的研究意義與應(yīng)用價(jià)值。深度學(xué)習(xí)技術(shù)的蓬勃發(fā)展為更加精準(zhǔn)、實(shí)時(shí)的手語識別帶來了新的機(jī)遇。該文綜述了近年來基于深度學(xué)習(xí)的手語識別技術(shù)，從孤立詞與連續(xù)語句兩個(gè)分支展開詳細(xì)的算法闡述與分析。孤立詞識別技術(shù)劃分為基于卷積神經(jīng)網(wǎng)絡(luò)(CNN)、3維卷積神經(jīng)網(wǎng)絡(luò)(3D-CNN)和循環(huán)神經(jīng)網(wǎng)絡(luò)(RNN) 3種架構(gòu)的方法；連續(xù)語句識別所用模型復(fù)雜度更高，通常需要輔助某種長時(shí)時(shí)序建模算法，按其主體結(jié)構(gòu)分為雙向長短時(shí)記憶網(wǎng)絡(luò)模型、3維卷積網(wǎng)絡(luò)模型和混合模型。歸納總結(jié)了目前國內(nèi)外常用手語數(shù)據(jù)集，探討了手語識別技術(shù)的研究挑戰(zhàn)與發(fā)展趨勢，高精度前提下的魯棒性和實(shí)用化仍有待于推進(jìn)。
- 深度學(xué)習(xí) /
- 手語識別 /
- 卷積網(wǎng)絡(luò) /
- 循環(huán)神經(jīng)網(wǎng)絡(luò) /
- 長時(shí)序建模
Abstract:
Sign language recognition involves computer vision, pattern recognition, human-computer interaction, etc. It has important research significance and application value. The flourishing of deep learning technology brings new opportunities for more accurate and real-time sign language recognition. This paper reviews the sign language recognition technology based on deep learning in recent years, formulates and analyzes the algorithms from two branches - isolated words and continuous sentences. The isolated-word recognition technology is divided into three structures: Convolutional Neural Network (CNN), Three-Dimensional Convolutional Neural Network (3D-CNN) and Recurrent Neural Network (RNN) based method. The model used for continuous sentence recognition has higher complexity and is usually assisted with certain kind of long-term temporal sequence modeling algorithm. According to the major structure, there are three categories: the bidirectional LSTM, the 3D convolutional network model and the hybrid model. Common sign language datasets at home and abroad are summarized. Finally, the research challenges and development trends of sign language recognition technology are discussed, concluding that the robustness and practicality on the premise of high-precision still requires to be promoted.
- Deep learning /
- Sign language recognition /
- Convolutional Neural Network (CNN) /
- Recurrent Neural Network (RNN) /
- Long-term temporal sequence modeling

HTML全文

圖 1 總體分類圖

下載: 全尺寸圖片幻燈片

圖 2 RWTH德國手語數(shù)據(jù)樣例

下載: 全尺寸圖片幻燈片

圖 3 CSL中國手語數(shù)據(jù)樣例

下載: 全尺寸圖片幻燈片

圖 4 每幀的視覺方式

下載: 全尺寸圖片幻燈片

表 1 基于深度學(xué)習(xí)的孤立詞手語識別技術(shù)及代表性工作

作者/單位	年份	技術(shù)特點(diǎn)	準(zhǔn)確率(%)	數(shù)據(jù)集	樣本大小
Tang Ao, Li HouQiang, Huang Jie, Li Xiaoxu, Huang Shiliang/中國科學(xué)技術(shù)大學(xué)	2013	卷積神經(jīng)網(wǎng)絡(luò)(基于RGB-D并對手部進(jìn)行分割與追蹤)^[4]	98.12	American Sign Language(ASL)	50700幀
	2015	3維卷積神經(jīng)網(wǎng)絡(luò)(多模態(tài)輸入)^[17]	94.20	Chinese Sign Language(CSL)	25類
	2016	循環(huán)神經(jīng)網(wǎng)絡(luò)(加入軌跡數(shù)據(jù))^[27]	85.60	Chinese Sign Language(CSL)	500類
	2017	長短時(shí)記憶網(wǎng)絡(luò)(加入手型描述符)^[28]	86.20		100類
	2018	循環(huán)神經(jīng)網(wǎng)絡(luò)(關(guān)鍵幀視頻序列篩選)^[29]	91.18		310類
		3維卷積網(wǎng)絡(luò)(基于注意力機(jī)制)^[18]	88.70		500類
Pigou L/根特大學(xué)	2014	卷積神經(jīng)網(wǎng)絡(luò)^[5]	91.70	Chalearn	20類
Pigou L/根特大學(xué)	2016	3維卷積網(wǎng)絡(luò)(多模態(tài)數(shù)據(jù)的特征融合)^[16]	81.00	2014
Molchanov P，Garcia B，Hardie Cate/斯坦福大學(xué)	2015	3維卷積網(wǎng)絡(luò)(多尺度數(shù)據(jù))^[15]	77.50	VIVA Dataset
	2015	循環(huán)神經(jīng)網(wǎng)絡(luò)^[25]	90.80	南威爾士大學(xué)數(shù)據(jù)集	95類
	2016	卷積神經(jīng)網(wǎng)絡(luò)^[9]	91.63	ASL fingerspelling
Kang B /加州大學(xué)	2015	卷積神經(jīng)網(wǎng)絡(luò)^[6]	99.99	ASL fingerspelling	31類
Miao Qiguang /西安電子科技大學(xué)	2016	3維卷積神經(jīng)網(wǎng)絡(luò)(基于RGB-D)^[19]	56.90	Chalearn
	2017	(基于顯著性特征和RGB-D)^[20]	59.43
	2017	(基于多模態(tài)數(shù)據(jù)和手部特征增強(qiáng))^[21]	67.71
Koller O/亞琛工業(yè)大學(xué)	2016	卷積神經(jīng)網(wǎng)絡(luò)(關(guān)注手型變化)^[8]		Danish Sign Language	分辨率4730×22
Chai Xiujuan/中科院計(jì)算所	2017	改進(jìn)的RNN(對手部分割定位)^[26]	99.00	Chinese Sign Language(CSL)	40類
Yang Su/北京工業(yè)大學(xué)	2017	RNN和CNN相結(jié)合^[30]	98.43	CSL	40類
Yang Su/北京工業(yè)大學(xué)	2017	RNN(數(shù)據(jù)預(yù)處理)^[31]	99.00	CSL	40類
Hossen M A /特斯瓦拉工程學(xué)院	2017	卷積神經(jīng)網(wǎng)絡(luò)^[7]	100.00	Kinect錄制	10類
ElBadawy M /埃及埃因薩姆斯大學(xué)	2017	3維卷積網(wǎng)絡(luò)^[22]	98.00	阿拉伯?dāng)?shù)據(jù)集	25類
Kim S /韓國首爾大學(xué)	2017	卷積神經(jīng)網(wǎng)絡(luò)(幀間采樣)^[10]	86.00	攝像頭采集	20類
Kim S /韓國首爾大學(xué)	2018	卷積神經(jīng)網(wǎng)絡(luò)(手部分割)^[11]	98.00		12類
Kopuklu O/德國慕尼黑大學(xué)	2018	卷積神經(jīng)網(wǎng)絡(luò)(時(shí)空特征融合)^[12]	96.28	Jester Chalearn
Kopuklu O/德國慕尼黑大學(xué)	2018	卷積神經(jīng)網(wǎng)絡(luò)(時(shí)空特征融合)^[12]	57.40	Jester Chalearn
Konstantinidis D /希臘大學(xué)	2018	卷積神經(jīng)網(wǎng)絡(luò)(RGB和骨架數(shù)據(jù))^[13]	98.09	阿根廷數(shù)據(jù)集LSA64
Konstantinidis D /希臘大學(xué)	2018	循環(huán)神經(jīng)網(wǎng)絡(luò)(多模態(tài)數(shù)據(jù)融合)^[36]	89.50	印度手語數(shù)據(jù)集(IIT)
Devineau G /巴黎圣米歇爾研究大學(xué)	2018	卷積神經(jīng)網(wǎng)絡(luò)(骨架數(shù)據(jù)、加入手部關(guān)節(jié)點(diǎn)位置序列)^[14]	84.35	DHG Dataset	28 類
Ye Yuancheng /紐約城市大學(xué)	2018	3維卷積網(wǎng)絡(luò)(特征融合)^[23]	69.20	American Sign Language	27類
Liang Zhijie /華中師范大學(xué)	2018	3維卷積網(wǎng)絡(luò)(骨架、輪廓、深度數(shù)據(jù))^[24]	83.60	Chalearn
Lin Chi/中國科學(xué)院自動化所	2018	帶有掩膜的ResC3D網(wǎng)絡(luò)與RNN相結(jié)合^[32]	68.42	Chalearn
Halim K /印尼大學(xué)	2018	循環(huán)神經(jīng)網(wǎng)絡(luò)(基于SIBI詞性變化手勢的特征集)^[33]	96.15	印尼手語數(shù)據(jù)集
Masood S /新德里大學(xué)	2018	循環(huán)神經(jīng)網(wǎng)絡(luò)和卷積神經(jīng)網(wǎng)絡(luò)相結(jié)合^[34]	95.20	阿根廷數(shù)據(jù)集LSA64	46類
Bantupalli K /美國肯尼索州立大學(xué)	2018	循環(huán)神經(jīng)網(wǎng)絡(luò)和卷積神經(jīng)網(wǎng)絡(luò)相結(jié)合^[35]	93.00	American Sign Language(ASL)	100類
Hernandez V /東京農(nóng)業(yè)大學(xué)	2019	卷積神經(jīng)網(wǎng)絡(luò)與長短時(shí)記憶網(wǎng)絡(luò)相結(jié)合^[37]	89.30	American Sign Language(ASL)	19類
Liao YanQiu/南昌大學(xué)	2019	循環(huán)神經(jīng)網(wǎng)絡(luò)和3維卷積網(wǎng)絡(luò)相結(jié)合^[38]	86.90	Chinese Sign Language(CSL)	500類

下載: 導(dǎo)出CSV

表 2 基于深度學(xué)習(xí)的連續(xù)語句的手語識別技術(shù)及代表性工作

作者/單位	年份	技術(shù)特點(diǎn)	評估標(biāo)準(zhǔn)(%)	數(shù)據(jù)集	樣本大小
Camgoz NC, Koller O/亞琛工業(yè)大學(xué)	2016	3維卷積網(wǎng)絡(luò)(從RGB數(shù)據(jù)提取時(shí)序特征)^[45]	Jaccard系數(shù)：26.9	Chalearn
	2016	基于卷積神經(jīng)網(wǎng)絡(luò)和HMM的混合模型^[49]	WER:39.7	RWTH-PHOENX-Weather
	2017	基于CNN、HMM、CTC^[50]	WER:38.8
	2017	雙向長短時(shí)網(wǎng)絡(luò)-BLSTM(基于CTC算法)^[39]	WER:43.1		分辨率：5000×90
	2018	基于CNN、HMM及RNN的混合模型^[51]
Pigou L /根特大學(xué)	2017	基于3維網(wǎng)絡(luò)和LSTM混合模型(RGB-D)^[52]	Jaccard系數(shù)：31.6	Chalearn
Cui Runpeng/清華大學(xué)	2017	基于CNN和BLSTM(基于CTC算法)^[53]	WER:38.7	RWTHPHOENIX-Weather	分辨率：16000×20
Cui Runpeng/清華大學(xué)	2018	雙向長短時(shí)網(wǎng)絡(luò)-BLSTM(多模態(tài)數(shù)據(jù))^[40]	WER:46.9
Shi B /美國芝加哥大學(xué)	2018	基于注意力機(jī)制的長短時(shí)網(wǎng)絡(luò)^[41]	WER:41.9	AmericanSign Language (ASL)
Ko S K /韓國電子研究所	2018	循環(huán)神經(jīng)網(wǎng)絡(luò)(加入骨架關(guān)節(jié)點(diǎn)數(shù)據(jù))^[42]	Acc:89.5	KETI韓國手語數(shù)據(jù)集	100類
Zhang Qian/上海交通大學(xué)	2018	雙向長短時(shí)網(wǎng)絡(luò)-BLSTM^[43]	Acc:93.1	AmericanSign Language(ASL)	100類
Li Houqiang, Huang Jie /中國科學(xué)技術(shù)大學(xué)	2018	3維卷積網(wǎng)絡(luò)(時(shí)間分類的對齊算法)^[46]	WER:37.3	RWTH-PHOENIX-Weather
Li Houqiang, Huang Jie /中國科學(xué)技術(shù)大學(xué)	2018	雙流3維卷積網(wǎng)絡(luò)(加入LSTM)^[47]	Acc:82.7	ChineseSign Language	100類
Guo Dan/合肥工業(yè)大學(xué)，中國科學(xué)技術(shù)大學(xué)	2018	3維卷積神經(jīng)網(wǎng)絡(luò)(時(shí)域卷積、CTC算法、后融合策略)^[48]	WER:37.8	RWTH-PHOENIX-Weather
Guo Dan/合肥工業(yè)大學(xué)，中國科學(xué)技術(shù)大學(xué)	2018	3維卷積網(wǎng)絡(luò)和RNN相結(jié)合(自適應(yīng)變長在線關(guān)鍵片段挖掘關(guān)鍵幀)^[55]	Acc:92.9	ChineseSign Language(CSL)	100類
Ariesta M C /雅加達(dá)大學(xué)	2018	3維卷積網(wǎng)絡(luò)和RNN相結(jié)合(基于CTC)^[54]		SIBI	30類
Mittal A /印尼科技大學(xué)	2019	改進(jìn)的長短時(shí)記憶網(wǎng)絡(luò)^[44]	Acc:72.3	印度手語數(shù)據(jù)集(ISL)	942類

下載: 導(dǎo)出CSV

表 3 手語數(shù)據(jù)集分類

名稱	所屬國家	類別	場景	樣本	數(shù)據(jù)特點(diǎn)	數(shù)據(jù)類型	可用性
RWTH-PHOENIX-Weather^[56]	德國	1200	9	45760	RGB	句子	公開
Chalearn^[57]	美國	249	7	50000	RGB/深度	單詞	部分公開
DGS Kinect 40^[58]	德國	40	15	3000	多視角	孤立詞
CSL^[47]	中國	500/100	1	25000	深度/骨架/RGB	孤立詞/句子	公開
SIGNUM^[59]	德國	450	25	33210	RGB	句子	公開
GSL 20^[60]	希臘	20	6	840	RGB	單詞
Boston ASLLVD^[61]	美國	3300+	6	9800	RGB	單詞	公開
PSL Kinect 30^[62]	波蘭	30	1	300	RGB/深度	單詞	公開
LSA64^[63]	阿根廷	64	10	3200	RGB	單詞	公開
DEVISIGN-G^[64]	中國	36	8	432	RGB	單詞
DEVISIGN-D^[64]		500		6000
DEVISIGN-L^[64]		2000		24000
CUNY ASL^[65]	美國		8		RGB	句子
SignsWorld Atlas^[66]	阿拉伯	32	10		RGB	單詞	公開
ASL Fingerspelling^[67]	美國	24	5	131000	RGB/深度	單詞	公開

下載: 導(dǎo)出CSV

表 4 RWTH-PHOENIX-Weather參數(shù)

參數(shù)	2012年版	2014年版
# 操作者數(shù)量	7	9
# 樣例	190	645
# 幀數(shù)	293077	965940
# 語句數(shù)量	1980	6861
# 詞匯量	911	1558
# 分辨率	210×260	720×576

下載: 導(dǎo)出CSV

表 5 CSL數(shù)據(jù)集參數(shù)

參數(shù)名稱	數(shù)值
RGB分辨率	1920×1080
深度數(shù)據(jù)分辨率	512×424
視頻時(shí)長(s)	10～14
平均樣例數(shù)	7
總樣例	25000
# 操作者數(shù)量	50
詞匯量	178
骨架關(guān)節(jié)點(diǎn)數(shù)	21
fps	25
總時(shí)長	100+

下載: 導(dǎo)出CSV

參考文獻(xiàn)(69)

HINTON G E, OSINDERO S, and TEH Y W. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18(7): 1527–1554. doi: 10.1162/neco.2006.18.7.1527

周宇. 中國手語識別中自適應(yīng)問題的研究[D].[博士論文], 哈爾濱工業(yè)大學(xué), 2009.

ZHOU Yu. Research on signer adaptation in Chinese sign language recognition[D].[Ph.D. dissertation], Harbin Institute of Technology, 2009.

CHEOK M J, OMAR Z, and JAWARD M H. A review of hand gesture and sign language recognition techniques[J]. International Journal of Machine Learning and Cybernetics, 2019, 10(1): 131–153. doi: 10.1007/s13042-017-0705-5

TANG Ao, LU Ke, WANG Yufei, et al. A real-time hand posture recognition system using deep neural networks[J]. ACM Transactions on Intelligent Systems and Technology, 2015, 6(2): 1–23. doi: 10.1145/2735952

PIGOU L, DIELEMAN S, KINDERMANS P J, et al. Sign language recognition using convolutional neural networks[C]. European Conference on Computer Vision, Zurich, Switzerland, 2014: 572–578.

KANG B, TRIPATHI S, and NGUYEN T Q. Real-time sign language fingerspelling recognition using convolutional neural networks from depth map[C]. The 3rd IAPR Asian Conference on Pattern Recognition (ACPR), Kuala Lumpur, Malaysia, 2015: 136–140.

HOSSEN M A, GOVINDAIAH A, SULTANA S, et al. Bengali sign language recognition using Deep Convolutional Neural Network[C]. The 7th Joint International Conference on Informatics, Electronics & Vision (ICIEV) and 2018 2nd International Conference on Imaging, Vision & Pattern Recognition (icIVPR), Kitakyushu, Japan, 2018: 369–373.

KOLLER O, BOWDEN R, and NEY H. Automatic alignment of hamNoSys subunits for continuous sign language recognition[C]. The 10th Edition of the Language Resources and Evaluation Conference, Portoro?, Slovenia, 2016: 121–128.

GARCIA B and VIESCA S A. Real-time American sign language recognition with convolutional neural networks[J]. Convolutional Neural Networks for Visual Recognition, 2016, 2: 225–232.

JI Y, KIM S, and LEE K B. Sign language learning system with image sampling and convolutional neural network[C]. The 1st IEEE International Conference on Robotic Computing (IRC), Taichung, China, 2017: 371–375.

KIM S, JI Y, and LEE K B. An effective sign language learning with object detection based ROI segmentation[C]. The 2nd IEEE International Conference on Robotic Computing (IRC), Laguna Hills, USA, 2018: 330–333.

K?PüKLü O, K?SE N, and RIGOLL G. Motion fused frames: Data level fusion strategy for hand gesture recognition[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, USA, 2018: 2103–2111.

KONSTANTINIDIS D, DIMITROPOULOS K, and DARAS P. Sign language recognition based on hand and body skeletal data[C]. 2018-3DTV-Conference: The True Vision-Capture, Transmission and Display of 3D Video (3DTV-CON), Helsinki, Finland, 2018: 1–4.

DEVINEAU G, MOUTARDE F, WANG Xi, et al. Deep learning for hand gesture recognition on skeletal data[C]. The 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xian, China, 2018: 106–113.

MOLCHANOV P, GUPTA S, KIM K, et al. Hand gesture recognition with 3D convolutional neural networks[C]. 2015 IEEE Conference on Computer Vision and Pattern Recognition workshops, Boston, USA, 2015: 1–7.

WU Di, PIGOU L, KINDERMANS P J, et al. Deep dynamic neural networks for multimodal gesture segmentation and recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(8): 1583–1597. doi: 10.1109/TPAMI.2016.2537340

HUANG Jie, ZHOU Wengang, LI Houqiang, et al. Sign language recognition using 3D convolutional neural networks[C]. 2015 IEEE International Conference on Multimedia and Expo (ICME), Turin, Italy, 2015: 1–6.

HUANG Jie, ZHOU Wengang, LI Houqiang, et al. Attention-based 3D-CNNs for large-vocabulary sign language recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 29(9): 2822–2832. doi: 10.1109/TCSVT.2018.2870740

LI Yunan, MIAO Qiguang, TIAN Kuan, et al. Large-scale gesture recognition with a fusion of RGB-D data based on the C3D model[C]. The 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 2016: 25–30.

LI Yunan, MIAO Qiguang, TIAN Kuan, et al. Large-scale gesture recognition with a fusion of RGB-D data based on saliency theory and C3D model[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 28(10): 2956–2964. doi: 10.1109/TCSVT.2017.2749509

MIAO Qiguang, LI Yunan, OUYANG Wanli, et al. Multimodal gesture recognition based on the resc3d network[C]. 2017 IEEE International Conference on Computer Vision Workshops, Venice, Italy, 2017: 3047–3055.

ELBADAWY M, ELONS A S, SHEDEED H A, et al. Arabic sign language recognition with 3d convolutional neural networks[C]. The 8th International Conference on Intelligent Computing and Information Systems (ICICIS), Cairo, Egypt, 2017: 66–71.

YE Yuancheng, TIAN Yingli, HUENERFAUTH M, et al. Recognizing American sign language gestures from within continuous videos[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, USA, 2018: 2064–2073.

LIANG Zhijie, LIAO Shengbin, and HU Bingzhang. 3D convolutional neural networks for dynamic sign language recognition[J]. The Computer Journal, 2018, 61(11): 1724–1736. doi: 10.1093/comjnl/bxy049

CATE H, DALVI F, and HUSSAIN Z. Sign language recognition using temporal classification[EB/OL]. http://arxiv.org/abs/1701.01875v1, 2017.

CHAI Xiujuan, LIU Zhipeng, YIN Fang, et al. Two streams recurrent neural networks for large-scale continuous gesture recognition[C]. The 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 2016: 31–36.

LIU Tao, ZHOU Wengang, and LI Houqiang. Sign language recognition with long short-term memory[C]. 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, USA, 2016: 2871–2875.

LI Xiaoxu, MAO Chensi, HUANG Shiliang, et al. Chinese sign language recognition based on SHS descriptor and encoder-decoder LSTM model[C]. The 12th Chinese Conference on Biometric Recognition. Shenzhen, China, 2017: 719–728.

HUANG Shiliang, MAO Chensi, TAO Jinxu, et al. A novel chinese sign language recognition method based on keyframe-centered clips[J]. IEEE Signal Processing Letters, 2018, 25(3): 442–446. doi: 10.1109/LSP.2018.2797228

YANG Su and ZHU Qing. Continuous Chinese sign language recognition with CNN-LSTM[J]. SPIE, 2017, 10420.

YANG Su and ZHU Qing. Video-based Chinese sign language recognition using convolutional neural network[C]. The 9th IEEE International Conference on Communication Software and Networks (ICCSN), Guangzhou, China, 2017: 929–934.

LIN Chi, WAN Jun, LIANG Yanyan, et al. Large-scale isolated gesture recognition using a refined fused model based on masked Res-C3D network and skeleton LSTM[C]. The 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xi’an, China, 2018: 52–58.

HALIM K and RAKUN E. Sign language system for Bahasa Indonesia (Known as SIBI) recognizer using TensorFlow and Long Short-Term Memory[C]. 2018 International Conference on Advanced Computer Science and Information Systems (ICACSIS), Yogyakarta, Indonesia, 2018: 403–407.

BHATEJA V, COELLO C A C, and SATAPATHY S C. Intelligent Engineering Informatics[C]. The 6th International Conference on FICTA. Singapore: 2018: 623–632.

BANTUPALLI K and XIE Ying. American Sign Language recognition using deep learning and computer vision[C]. 2018 IEEE International Conference on Big Data (Big Data), Seattle, USA, 2018: 4896–4899.

KONSTANTINIDIS D, DIMITROPOULOS K, and DARAS P. A deep learning approach for analyzing video and skeletal features in sign language recognition[C]. 2018 IEEE International Conference on Imaging Systems and Techniques (IST), Krakow, Poland, 2018: 1–6.

VINCENT H, TOMOYA S, and GENTIANE V. Convolutional and recurrent neural network for human action recognition: Application on American sign language[EB/OL]. http://biorxiv.org/content/10.1101/535492v1, 2019.

LIAO Yanqiu, XIONG Pengwen, MIN Weidong, et al. Dynamic sign language recognition based on video sequence with BLSTM-3D residual networks[J]. IEEE Access, 2019, 7: 38044–38054. doi: 10.1109/ACCESS.2019.2904749

CAMGOZ N C, HADFIELD S, KOLLER O, et al. SubUNets: End-to-end hand shape and continuous sign language recognition[C]. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017: 3075–3084.

CUI Runpeng, LIU Hu, and ZHANG Changshui. A deep neural framework for continuous sign language recognition by iterative training[J]. IEEE Transactions on Multimedia, 2019, 21(7): 1880–1891. doi: 10.1109/TMM.2018.2889563

SHI Bowen, DEL RIO A M, KEANE J, et al. American Sign Language fingerspelling recognition in the wild[C]. 2018 IEEE Spoken Language Technology Workshop (SLT), Athens, Greece, 2018: 145–152.

KO S K, SON J G, and JUNG H. Sign language recognition with recurrent neural network using human keypoint detection[C]. 2018 Conference on Research in Adaptive and Convergent Systems, Honolulu, USA, 2018: 326–328.

ZHANG Qian, WANG Dong, ZHAO Run, et al. MyoSign: Enabling end-to-end sign language recognition with wearables[C]. The 24th International Conference on Intelligent User Interfaces, Marina del Ray, USA, 2019: 650–660.

MITTAL A, KUMAR P, ROY P P, et al. A modified LSTM model for continuous sign language recognition using leap motion[J]. IEEE Sensors Journal, 2019, 19(16): 7056–7063. doi: 10.1109/JSEN.2019.2909837

CAMGOZ N C, HADFIELD S, KOLLER O, et al. Using convolutional 3d neural networks for user-independent continuous gesture recognition[C]. The 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 2016: 49–54.

PU Junfu, ZHOU Wengang, and LI Houqiang. Dilated convolutional network with iterative optimization for continuous sign language recognition[C]. The 27th International Joint Conference on Artificial Intelligence, Wellington, New Zealand, 2018: 885–891.

HUANG Jie, ZHOU Wengang, ZHANG Qilin, et al. Video-based sign language recognition without temporal segmentation[C]. The 32nd AAAI Conference on Artificial Intelligence, New Orleans, USA, 2018: 2257–2264.

WANG Shuo, GUO Dan, ZHOU Wengang, et al. Connectionist temporal fusion for sign language translation[C]. The 26th ACM International Conference on Multimedia, Seoul, Korea, 2018: 1483–1491.

KOLLER O, ZARGARAN O, NEY H, et al. Deep sign: Hybrid CNN-HMM for continuous sign language recognition[C]. 2016 British Machine Vision Conference, York, UK, 2016: 1–2.

KOLLER O, ZARGARAN S, and NEY H. Re-sign: Re-aligned end-to-end sequence modelling with deep recurrent CNN-HMMs[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, USA, 2017: 4297–4305.

KOLLER O, ZARGARAN S, NEY H, et al. Deep sign: Enabling robust statistical continuous sign language recognition via hybrid CNN-HMMs[J]. International Journal of Computer Vision, 2018, 126(12): 1311–1325. doi: 10.1007/s11263-018-1121-3

PIGOU L, VAN HERREWEGHE M, and DAMBRE J. Gesture and sign language recognition with temporal residual networks[C]. 2017 IEEE International Conference on Computer Vision Workshops, Venice, Italy, 2017: 3086–3093.

CUI Runpeng, LIU Hu, and ZHANG Changshui. Recurrent convolutional neural networks for continuous sign language recognition by staged optimization[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 7361–7369.

ARIESTA M C, WIRYANA F, SUHARJITO, et al. Sentence level Indonesian sign language recognition using 3D convolutional neural network and bidirectional recurrent neural network[C]. 2018 Indonesian Association for Pattern Recognition International Conference (INAPR), Jakarta, Indonesia, 2018: 16–22.

GUO Dan, ZHOU Wengang, LI Houqiang, et al. Hierarchical LSTM for sign language translation[C]. The 32nd AAAI Conference on Artificial Intelligence, the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence, New Orleans, USA, 2018: 6845–6852.

FORSTER J, SCHMIDT C, HOYOUX T, et al. RWTH-PHOENIX-Weather: A large vocabulary sign language recognition and translation corpus[C]. The 8th International Conference on Language Resources and Evaluation, Istanbul, Turkey, 2012: 3785–3789.

ESCALERA S, BARó X, GONZàLEZ J, et al. Chalearn looking at people challenge 2014: Dataset and results[C]. European Conference on Computer Vision, Zurich, Switzerland, 2014: 459–473.

ONG E J, COOPER H, PUGEAULT N, et al. Sign language recognition using sequential pattern trees[C]. 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, USA, 2012: 2200–2207.

VON AGRIS U, ZIEREN J, CANZLER U, et al. Recent developments in visual sign language recognition[J]. Universal Access in the Information Society, 2008, 6(4): 323–362. doi: 10.1007/s10209-007-0104-x

EFTHIMIOU E and FOTINEA S E. GSLC: Creation and annotation of a Greek sign language corpus for HCI[C]. The 4th International Conference on Universal Access in Human-Computer Interaction, Beijing, China, 2007: 657–666.

NEIDLE C, THANGALI A, and SCLAROFF S. Challenges in development of the American Sign Language lexicon video dataset (ASLLVD) corpus[C]. The 5th Workshop on the Representation and Processing of Sign Languages: Interactions between Corpus and Lexicon, Istanbul, Turkey, 2012: 1–8.

OSZUST M and WYSOCKI M. Polish sign language words recognition with Kinect[C]. The 6th International Conference on Human System Interactions (HSI), Sopot, Poland, 2013: 219–226.

RONCHETT F, QUIROGA F, ESTREBOU C A, et al. LSA64: An Argentinian sign language dataset[C]. The 22nd Congreso Argentino de Ciencias de la Computación (CACIC 2016), San Luis, USA, 2016: 794–803.

CHAI Xiujuan, WANG Hanjie, and CHEN Xilin. The DEVISIGN large vocabulary of Chinese sign language database and baseline evaluations[R]. Technical Report VIPL-TR-14-SLR-001, 2014.

LU Pengfei and HUENERFAUTH M. Collecting and evaluating the CUNY ASL corpus for research on American sign language animation[J]. Computer Speech & Language, 2014, 28(3): 812–831. doi: 10.1016/j.csl.2013.10.004

SHOHIEB S M, ELMINIR H K, and RIAD A M. Signsworld atlas; a benchmark Arabic sign language database[J]. Journal of King Saud University-Computer and Information Sciences, 2015, 27(1): 68–76. doi: 10.1016/j.jksuci.2014.03.011

PUGEAULT N and BOWDEN R. Spelling it out: Real-time ASL fingerspelling recognition[C]. 2011 IEEE International Conference on Computer Vision workshops (ICCV Workshops), Barcelona, Spain, 2011: 1114–1119.

PRABHAVALKAR R, SAINATH T N, WU Yonghui, et al. Minimum word error rate training for attention-based sequence-to-sequence models[C]. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, Canada, 2018: 4839–4843.

KOLLER O, FORSTER J, and NEY H. Continuous sign language recognition: Towards large vocabulary statistical recognition systems handling multiple signers[J]. Computer Vision and Image Understanding, 2015, 141: 108–125. doi: 10.1016/j.cviu.2015.09.013

相關(guān)文章

施引文獻(xiàn)

資源附件(0)

訪問統(tǒng)計(jì)