中圖分類號(hào): TP309 文獻(xiàn)標(biāo)識(shí)碼: A DOI: 10.19358/j.issn.2096-5133.2021.08.006 引用格式: 俞遠(yuǎn)哲,王金雙,,鄒霞,。 基于特征集聚和卷積神經(jīng)網(wǎng)絡(luò)的惡意PDF文檔檢測(cè)方法[J].信息技術(shù)與網(wǎng)絡(luò)安全,2021,,40(8):35-41.
A malicious PDF detection method based on feature agglomeration and convolutional neural network
Yu Yuanzhe,,Wang Jinshuang,Zou Xia
(Command & Control Engineering College,,Army Engineering University of PLA,,Nanjing 210001,China)
Abstract: To solve the high feature dimension problems and under-fitting due to the small dataset size, a malicious PDF document detection method based on feature agglomeration and CNN was proposed. Based on the word bag model, the regular and structural features are extracted from PDF documents. Then Ward′s Minimum Variance Clustering Method is used to achieve feature agglomeration according to the combined minimum variance of feature clusters. Afterwards, the agglomerate features are sent into the CNN classification model for training and evaluation. The optimal number of agglomerate features is determined by a comparison with the performances of the model under different numbers of agglomerate features. It was shown that the model proposed in this paper can reduce the dimension of the feature, improve the recall rate of model and mitigate the under-fitting problem at the same time.With different benign and malicious sample proportions, the recall rate is increased by 53% and the F-score is increased by 0.44 on average. Meanwhile, compared with detection tools PJScan, PDFrate and Luxor, the comprehensive detection performance is improved by 5% on average.
Key words : malicious PDF document,;feature agglomeration,;static detection;Convolutional Neural Network(CNN)