中圖分類號: TP391 文獻(xiàn)標(biāo)識碼: A DOI: 10.19358/j.issn.2096-5133.2020.11.006 引用格式: 景鴻理,,黃娜,,李建國. 基于機(jī)器學(xué)習(xí)的惡意軟件檢測研究進(jìn)展及挑戰(zhàn)[J].信息技術(shù)與網(wǎng)絡(luò)安全,2020,,39(11):38-44,,68.
Research progress and challenges of malware detection method based on machine learning
Jing Hongli1,Huang Na1,,2,,Li Jianguo1
1.Beijing Topsec Science & Technology Inc.,Beijing 100085,,China,; 2.Beijing University of Technology,Beijing 100124,,China
Abstract: Due to the increasing number of malware and the updated attack means, malware detection combined with machine learning technology is a new direction of its development. Firstly, this paper introduces the static detecting methods and dynamic detecting methods of malware briefly; summarizes the general process of malware detecting methods based on machine learning, and reviews the existing methods with research progress. Using the data sets of Ember 2017 and Ember 2018, the structural feature correlation methods, including RF(Random Forest), LightGBM, SVM(Support Vector Machine), K-means and CNN(Convolutional Neural Network), are analyzed and validated,and the 2019 sample set analysis is used to validate the serialization feature correlation method, including several common deep learning algorithm models. The accuracy, precision, recall and F1_score of the trained model on different testing data sets are calculated as evaluating metrics. According to the experimental results, the advantages and disadvantages of various methods are discussed in this paper, the generalization ability of the tree model is verified and analyzed emphatically. It is shown that the model generally has degradation problem with the continuous evolution of samples, and the further research direction is pointed out at last.
Key words : malware detection,;static detection of malware;machine learning,;LightGBM,;random forest