中圖分類號: TP309 文獻(xiàn)標(biāo)識碼: A DOI: 10.19358/j.issn.2096-5133.2021.11.003 引用格式: 俞遠(yuǎn)哲,,王金雙,鄒霞. 基于文檔圖結(jié)構(gòu)的惡意PDF文檔檢測方法[J].信息技術(shù)與網(wǎng)絡(luò)安全,,2021,,40(11):16-23.
Malicious PDF detection method based on document graph structure
Yu Yuanzhe,Wang Jinshuang,,Zou Xia
(Command & Control Engineering College,,Army Engineering University of PLA,Nanjing 210007,,China)
Abstract: Malicious PDF detection methods based on machine learning rely on the expert knowledge, which still cannot fully reflect the document attributes. Moreover, the performances of the detectors are easily affected by adversarial samples. To overcome these limitations, a malicious PDF detection method based on the PDF document graph structures and Convolutional Neural Network(CNN) was proposed. Firstly, a directed graph was constructed according to the document structure and the reference relationships between document objects. Secondly, the contribution of each node was calculated using TF-IDF algorithm, according to which the graph structures was simplified. Thirdly, the adjacency and degree matrices of the simplified graph were calculated, and the Laplacian matrix of the graph was obtained, which was used as a feature and sent to the CNN classification model for training. Adversarial samples were also added to train the model. It was evaluated that this method has an accuracy of 99.71% which is better than KNN and SVM classification models. Compared with the 67 antivirus engines on VirusTotal, it has achieved higher detection performance in the detection of adversarial samples.
Key words : malicious PDF document,;document graph structure;CNN,;adversarial sample