摘要: 基于遙感圖像多尺度,、無法準(zhǔn)確提取微小物體、物體類別易混淆的問題,,提出了一種融合對象和多尺度視覺特征的遙感圖像描述模型(Fusion of Object and Multiscale Visual Feature,,F(xiàn)O-MSV),通過構(gòu)建的對象提取器分析文本信息,,提取其中的對象信息,;設(shè)計(jì)了一種多尺度交互模塊,獲取遙感圖像的多尺度視覺特征,,以適應(yīng)多尺度的特點(diǎn),;為了充分利用對象信息并融合視覺信息,提出了一種新的對象-視覺特征融合機(jī)制,,調(diào)整視覺上下文和對象上下文之間的平衡,。基于該領(lǐng)域內(nèi)三個(gè)數(shù)據(jù)集的實(shí)驗(yàn)結(jié)果表明,,該模型能明顯提升描述的性能,,與其他先進(jìn)模型相比具有競爭力,。
中國分類號: TP391 文獻(xiàn)標(biāo)識碼: A DOI: 10.19358/j.issn.2097-1788.2022.06.011 引用格式: 賈亞敏,陳姣,,彭玉青. 融合對象和多尺度視覺特征的遙感圖像描述模型[J].網(wǎng)絡(luò)安全與數(shù)據(jù)治理,,2022,41(6):78-83,,89.
Remote sensing image caption model with fusion of object and multiscale visual feature
Jia Yamin,,Chen Jiao,Peng Yuqing
(School of Artificial Intelligence,,Hebei University of Technology,,Tianjin 300401,China)
Abstract: Aiming at the problems that remote sensing image has multiscale features and the object categories are easy to be confused, cannot accurately extract the tiny objects from images, a new remote sensing image caption model(FO-MSV) is proposed, which analyzes the text information through the constructed object extractor, to extract the object information. A multiscale interaction module is designed to obtain the multiscale visual features of remote sensing images to adapt to the characteristics of multiscale. In order to make full use of object information and fuse visual information, a new object-visual feature fusion mechanism is proposed to adjust the balance between visual context and object context. Experimental results on three datasets show that the proposed model can significantly improve the performance of captions and is competitive compared with other advanced models.
為解決上述問題且適應(yīng)遙感圖像場景多尺度的特點(diǎn),本文提出了融合對象和多尺度視覺特征的遙感圖像描述模型(Fusion of Object and Multiscale Visual Feature,,F(xiàn)O-MSV),。該模型構(gòu)建對象提取器(Object Extractor,OE)利用指針生成網(wǎng)絡(luò)[3]得到的整合描述提取對象信息以避免遺漏微小物體,。同時(shí)提出了一種新的多尺度交互模塊(Multiscale Interaction Module,,MSCM)來獲取圖像的多尺度視覺特征適應(yīng)多尺度的特點(diǎn),。此外,設(shè)計(jì)一種新的對象-視覺融合機(jī)制(Object-Visual Fusion Mechanism,,ovFM)來利用對象信息并融合多尺度視覺信息避免出現(xiàn)識別對象錯(cuò)誤的問題,,且改善了長短時(shí)記憶網(wǎng)絡(luò)(Long Short Term Networks,LSTM)的結(jié)構(gòu),,稱為多輸入LSTM(Multi-Input LSTM,,I_LSTM)。