中圖分類號(hào): TP391.4 文獻(xiàn)標(biāo)識(shí)碼: A DOI:10.16157/j.issn.0258-7998.222903 中文引用格式: 陳紅順,陳觀明. 基于深度學(xué)習(xí)的詞語(yǔ)級(jí)中文唇語(yǔ)識(shí)別[J].電子技術(shù)應(yīng)用,,2022,,48(12):54-58. 英文引用格式: Chen Hongshun,Chen Guanming. Chinese word-level lip reading based deep learning[J]. Application of Electronic Technique,,2022,,48(12):54-58.
Chinese word-level lip reading based deep learning
Chen Hongshun1,Chen Guanming1,,2
1.School of Information Technology,,Beijing Normal University(Zhuhai),Zhuhai 519087,,China,; 2.Zhuhai Orbita Aerospace Science & Technology Co.,Ltd.,,Zhuhai 519080,,China
Abstract: Lip reading is crucial in the silent environment or environments with serious noise interference, or for people with hearing impairment. For word-level Chinese lip reading problem, SinoLipReadingNet model is proposed, the front end of which with Conv3D and ResNet34 is used to extract temporal-spatial features, and the back end of which with Conv1D and Bi-LSTM are used for classification and prediction respectively. Also, self-attention and CTCLoss are added to improve the back end with Bi-LSTM. Finally,the SinoLipReadingNet model is tested on XWBank lipreading dataset and results show that the prediction accuracy is significantly better than that of D3D model, the prediction accuracy and avrage CER of multi-model fusion reaches 77.64% and 21.68% respectively.
Key words : lip reading;ResNet,;Bi-LSTM;CTCLoss,;self-attention