基于增強(qiáng)語(yǔ)義信息理解的場(chǎng)景圖生成-AET-電子技術(shù)應(yīng)用

基于增強(qiáng)語(yǔ)義信息理解的場(chǎng)景圖生成

2023年電子技術(shù)應(yīng)用第5期

曾軍英，陳運(yùn)雄，秦傳波，陳宇聰，王迎波，田慧明，顧亞謹(jǐn)

（五邑大學(xué) 智能制造學(xué)部，廣東江門(mén) 529020）

摘要： 場(chǎng)景圖生成（SGG）任務(wù)旨在檢測(cè)圖像中的視覺(jué)關(guān)系三元組，即主語(yǔ)、謂語(yǔ)、賓語(yǔ)，為場(chǎng)景理解提供結(jié)構(gòu)視覺(jué)布局。然而，現(xiàn)有的場(chǎng)景圖生成方法忽略了預(yù)測(cè)的謂詞頻率高但卻無(wú)信息性的問(wèn)題，從而阻礙了該領(lǐng)域進(jìn)步。為了解決上述問(wèn)題，提出一種基于增強(qiáng)語(yǔ)義信息理解的場(chǎng)景圖生成算法。整個(gè)模型由特征提取模塊、圖像裁剪模塊、語(yǔ)義轉(zhuǎn)化模塊、拓展信息謂詞模塊四部分組成。特征提取模塊和圖像裁剪模塊負(fù)責(zé)提取視覺(jué)特征并使其具有全局性和多樣性。語(yǔ)義轉(zhuǎn)化模塊負(fù)責(zé)將謂詞之間的語(yǔ)義關(guān)系從常見(jiàn)的預(yù)測(cè)中恢復(fù)信息預(yù)測(cè)。拓展信息謂詞模塊負(fù)責(zé)擴(kuò)展信息謂詞的采樣空間。在數(shù)據(jù)集VG和VG-MSDN上與其他方法進(jìn)行比較，平均召回率分別達(dá)到59.5%和40.9%。該算法可改善預(yù)測(cè)出來(lái)的謂詞信息性不足問(wèn)題，進(jìn)而提升場(chǎng)景圖生成算法的性能。

關(guān)鍵詞： 場(chǎng)景圖生成圖像裁剪語(yǔ)義轉(zhuǎn)化拓展信息

中圖分類(lèi)號(hào)：TP391
文獻(xiàn)標(biāo)志碼：A
DOI: 10.16157/j.issn.0258-7998.223276
中文引用格式： 曾軍英，陳運(yùn)雄，秦傳波，等. 基于增強(qiáng)語(yǔ)義信息理解的場(chǎng)景圖生成[J]. 電子技術(shù)應(yīng)用，2023，49(5)：52-56.
英文引用格式： Zeng Junying，Chen Yunxiong，Qin Chuanbo，et al. Scene graph generation based on enhanced semantic information understanding[J]. Application of Electronic Technique，2023，49(5)：52-56.

Scene graph generation based on enhanced semantic information understanding

Zeng Junying，Chen Yunxiong，Qin Chuanbo，Chen Yucong，Wang Yingbo，Tian Huiming，Gu Yajin

(Department of Intelligent Manufacturing， Wuyi University， Jiangmen 529020，China)

Abstract： The Scene Graph Generation (SGG) task aims to detect visual relation triples in images, i.e. subject, predicate and object, to provide a structural visual layout for scene understanding. However, existing approaches to scene graph generation ignore the high frequency but uninformative problem of predicted predicates, hindering progress in this field. In order to solve the above problems, this paper proposes a scene graph generation algorithm based on enhanced semantic information understanding. The whole model consists of four parts: feature extraction module, image cropping module, semantic transformation module and extended information predicate module. Feature extraction module and image cropping module are responsible for extracting visual features and making them global and diverse. The semantic transformation module is responsible for restoring the semantic relationship between predicates from common predictions to informative predictions. The extended information predicate module is responsible for extending the sampling space of the information predicate. Comparing with other methods on datasets VG and VG-MSDN, the average recall reaches 59.5% and 40.9%, respectively. The algorithm in this paper can improve the problem of insufficient information of the predicted predicate, and then improve the performance of the scene graph generation algorithm.

Key words : scene graph generation；image cropping；semantic transformation；extended information

0　引言

場(chǎng)景圖生成 (SGG) 任務(wù)的目標(biāo)是從給定圖像生成圖結(jié)構(gòu)表示，以抽象出對(duì)象（以邊界框?yàn)榛A(chǔ)）及其成對(duì)關(guān)系。場(chǎng)景圖旨在促進(jìn)對(duì)圖像中復(fù)雜場(chǎng)景的理解，并具有廣泛的下游應(yīng)用潛力，例如圖像檢索、視覺(jué)推理、視覺(jué)問(wèn)答(VQA)、圖像字幕、結(jié)構(gòu)化圖像生成和外繪和機(jī)器人技術(shù)。好的場(chǎng)景圖可以在感興趣的實(shí)例之間提供信息豐富的關(guān)系。現(xiàn)有的場(chǎng)景圖生成大多遵循通用的范式，即從圖像中檢測(cè)目標(biāo)，提取區(qū)域特征，然后在標(biāo)準(zhǔn)分類(lèi)目標(biāo)函數(shù)的指導(dǎo)下識(shí)別謂詞類(lèi)別。但是，這種范式有幾方面的缺點(diǎn)。

本文詳細(xì)內(nèi)容請(qǐng)下載：http://forexkbc.com/resource/share/2000005313

作者信息：

曾軍英，陳運(yùn)雄，秦傳波，陳宇聰，王迎波，田慧明，顧亞謹(jǐn)

（五邑大學(xué) 智能制造學(xué)部，廣東江門(mén) 529020）

微信圖片_20210517164139.jpg

原創(chuàng)聲明：此內(nèi)容為AET網(wǎng)站原創(chuàng)，未經(jīng)授權(quán)禁止轉(zhuǎn)載。

相關(guān)內(nèi)容