《電子技術(shù)應(yīng)用》
您所在的位置:首頁 > 人工智能 > 設(shè)計應(yīng)用 > 融合圖文預(yù)訓(xùn)練的漢越多模態(tài)神經(jīng)機器翻譯
融合圖文預(yù)訓(xùn)練的漢越多模態(tài)神經(jīng)機器翻譯
電子技術(shù)應(yīng)用
韋浩翔1,,2,,高盛祥1,2,,余正濤1,,2,,王曉聰1,2
1.昆明理工大學(xué) 信息工程與自動化學(xué)院,;2.云南省人工智能重點實驗室
摘要: 由于漢語和越南語之間存在顯著的語法差異及語料稀缺,,漢越神經(jīng)機器翻譯任務(wù)面臨名詞翻譯不準(zhǔn)確的挑戰(zhàn)。提出了一種新穎的多模態(tài)神經(jīng)機器翻譯方法,,該方法融合了文本預(yù)訓(xùn)練模型和視覺語言聯(lián)合預(yù)訓(xùn)練模型,。通過文本預(yù)訓(xùn)練模型,能夠捕獲深層的語言結(jié)構(gòu)和語義,;而視覺語言聯(lián)合訓(xùn)練模型則提供了與文本相關(guān)聯(lián)的視覺上下文,,這有助于模型更準(zhǔn)確地理解和翻譯名詞。兩種模型通過一個簡潔高效的映射網(wǎng)絡(luò)結(jié)合,,并通過Gumbel門控模塊動態(tài)地整合多模態(tài)信息,,以優(yōu)化翻譯輸出。在漢越及越漢翻譯任務(wù)中,該方法相比傳統(tǒng)Transformer模型分別提升了7.13和4.27的BLEU值,。
中圖分類號:TP391 文獻標(biāo)志碼:A DOI: 10.16157/j.issn.0258-7998.245391
中文引用格式: 韋浩翔,,高盛祥,余正濤,,等. 融合圖文預(yù)訓(xùn)練的漢越多模態(tài)神經(jīng)機器翻譯[J]. 電子技術(shù)應(yīng)用,,2024,50(12):48-54.
英文引用格式: Wei Haoxiang,,Gao Shengxiang,,Yu Zhengtao,et al. Chinese-Vietnamese multimodal neural machine translation with integrated image-text pre-training[J]. Application of Electronic Technique,,2024,,50(12):48-54.
Chinese-Vietnamese multimodal neural machine translation with integrated image-text pre-training
Wei Haoxiang1,2,,Gao Shengxiang1,,2,Yu Zhengtao1,,2,,Wang Xiaocong1,2
1.Faculty of Information Engineering and Automation,, Kunming University of Science and Technology,;2.Yunnan Key Laboratory of Artificial Intelligence
Abstract: Due to significant grammatical differences and a scarcity of linguistic resources between Chinese and Vietnamese, the task of Chinese-Vietnamese neural machine translation faces challenges in the accurate translation of nouns. This paper proposes a novel multimodal neural machine translation method that integrates a text-based pre-trained model with a visual-linguistic joint pre-training model. The text-based model captures deep linguistic structures and semantics, while the visual-linguistic joint training model provides visual context related to the text, which helps the model understand and translate nouns more accurately. The two models are combined through a streamlined and efficient mapping network and dynamically integrate multimodal information via a Gumbel gating module to optimize translation outputs. In both Chinese-Vietnamese and Vietnamese-Chinese translation tasks, this method has achieved improvements of 7.13 and 4.27 BLEU points, respectively, compared to the traditional Transformer model.
Key words : Chinese-Vietnamese neural machine translation;vision-language joint pre-training,;multimodal,;attention

引言

機器翻譯是利用計算機程序?qū)⒁环N自然語言的文本自動轉(zhuǎn)換成另一種自然語言。隨著中國的“一帶一路”倡議的不斷推進,,中越兩國在經(jīng)濟和文化領(lǐng)域的交流與合作日益增強,,高效且準(zhǔn)確的翻譯服務(wù)變得尤為關(guān)鍵。尤其是神經(jīng)機器翻譯技術(shù)的應(yīng)用,,極大提升了翻譯的速度和質(zhì)量,,有效地促進了兩國之間的信息交流與理解,為雙邊關(guān)系的深化提供了堅實的語言支持,。

由于漢語-越南語語言對屬于低資源語言對,,語料資源稀缺,且漢語和越南語語法差異巨大,,名詞翻譯錯誤一直是漢越神經(jīng)機器翻譯的一個難點,,這個問題的存在導(dǎo)致了漢越神經(jīng)機器翻譯模型的翻譯不準(zhǔn)確。

為了解決漢越神經(jīng)機器翻譯中名詞翻譯不準(zhǔn)確和在少量語料下翻譯模型性能不佳的問題,,本文提出融合圖文預(yù)訓(xùn)練的漢越多模態(tài)神經(jīng)機器翻譯方法,。通過Gumbel門控機制,將視覺-文本聯(lián)合預(yù)訓(xùn)練模型M-CLIP和多語言翻譯預(yù)訓(xùn)練模型mBART進行有效結(jié)合。借助視覺信息,,解決名詞翻譯錯誤問題,;引入mBART預(yù)訓(xùn)練模型,提升稀缺語料下的翻譯性能,;通過Gumbel門控機制,,融合多模態(tài)信息,,排除無關(guān)視覺信息對翻譯模型的干擾,。


本文詳細內(nèi)容請下載:

http://forexkbc.com/resource/share/2000006247


作者信息:

韋浩翔1,2,,高盛祥1,,2,余正濤1,,2,,王曉聰1,2

(1.昆明理工大學(xué) 信息工程與自動化學(xué)院,,云南 昆明 650500,;

2.云南省人工智能重點實驗室,云南 昆明 650500)


Magazine.Subscription.jpg

此內(nèi)容為AET網(wǎng)站原創(chuàng),,未經(jīng)授權(quán)禁止轉(zhuǎn)載,。