面向CNN的類激活映射算法研究-AET-電子技術(shù)應(yīng)用

面向CNN的類激活映射算法研究

信息技術(shù)與網(wǎng)絡(luò)安全 1期

楊繼增，關(guān)勝曉

(中國科學(xué)技術(shù)大學(xué) 信息科學(xué)技術(shù)學(xué)院，安徽合肥230026)

摘要： 類激活映射(CAM)是卷積神經(jīng)網(wǎng)絡(luò)(CNN)解釋中的一種直觀的方法，通常由CNN的最后一個卷積層生成，可以突出顯示輸入圖片中目標(biāo)類的不同區(qū)域。之前的CAM方法只依賴于最后的卷積層，生成的解釋圖只能顯示模糊的物體位置信息。提出了一種新的方法即分層加權(quán)類激活映射方案(SL-CAM)，通過加權(quán)合并CNN淺層到深層的信息來生成類激活圖。由淺層特征圖及其對應(yīng)的梯度生成的激活圖包含詳細(xì)、準(zhǔn)確但噪聲大的位置信息；而由深層特征圖生成的激活圖包含噪聲少但模糊的位置信息。在LSVRC2012 Val上的實驗表明，SL-CAM多項指標(biāo)上均優(yōu)于Grad-CAM、Grad-CAM++和Score-CAM。

關(guān)鍵詞： 類激活映射卷積神經(jīng)網(wǎng)絡(luò) 可視化

中圖分類號： TP183
文獻(xiàn)標(biāo)識碼： A
DOI： 10.19358/j.issn.2096-5133.2022.01.010
引用格式：楊繼增，關(guān)勝曉. 面向CNN的類激活映射算法研究[J].信息技術(shù)與網(wǎng)絡(luò)安全，2022，41(1)：63-68.

A class activation mapping algorithm for CNN

Yang Jizeng，Guan Shengxiao

(School of Information Science and Technology，University of Science and Technology of China，Hefei 230026，China)

Abstract： Class activation mapping(CAM) is a straightforward method in the interpretation of convolutional neural networks(CNN), usually generated by the last convolution layer of CNN, which can highlight different object regions of the target class. Several previous CAM methods only depend on the final convolution layer, so the class activation map, generated by these, can only show the rough object position information. A new method called score-weighted & layer-wise class activation mapping(SL-CAM) was proposed to generate class activation maps by merging shallow to deep information of CNN. The class activation map generated from the shallow feature map and its corresponding gradient contains detailed and accurate location information with more noise. The activation map generated from the deep feature map contains less noise with rough location information. SL-CAM outperforms Grad-CAM, Grad-CAM++ and Score-CAM on ILSVRC2012 Val.

Key words : class activation mapping；convolutional neural networks；visualization

0 引言

近年來，以CNN為代表的深度學(xué)習(xí)在計算機(jī)視覺領(lǐng)域取得了突出的成果。使用端到端模型訓(xùn)練分類器的CNN可以很好地完成大量的圖像處理任務(wù)。然而，端到端模型的黑盒屬性使CNN能夠直接基于輸入給出結(jié)果。早期的人工智能系統(tǒng)的內(nèi)部機(jī)制主要是邏輯和符號，CNN的解釋方法被提出后，可視化成為最直接的策略。換句話說，可視化網(wǎng)絡(luò)預(yù)測結(jié)果與輸入圖片的一些關(guān)聯(lián)區(qū)域，如輸入特征的重要性或?qū)W習(xí)的權(quán)重，已經(jīng)成為最直接的方法。基于梯度[1]、基于擾動[2]、基于CAM[3]是三種被廣泛采用的方法。

基于梯度的方法通常獲取含有大量噪聲的低質(zhì)量的解釋圖。其步驟如下：首先，通過網(wǎng)絡(luò)的反向傳播得到輸入空間的梯度圖；其次，對梯度圖進(jìn)行處理得到表示輸入圖片對特定類輸出結(jié)果貢獻(xiàn)程度的熱力圖。基于擾動的方法[2，4]通常用擾動噪聲改變原始輸入來觀察模型預(yù)測得分的變化。然而，此方法需要花費大量時間來迭代查詢模型預(yù)測結(jié)果。

本文詳細(xì)內(nèi)容請下載：http://forexkbc.com/resource/share/2000003938

作者信息：

楊繼增，關(guān)勝曉

(中國科學(xué)技術(shù)大學(xué) 信息科學(xué)技術(shù)學(xué)院，安徽合肥230026)

原創(chuàng)聲明：此內(nèi)容為AET網(wǎng)站原創(chuàng)，未經(jīng)授權(quán)禁止轉(zhuǎn)載。

相關(guān)內(nèi)容