中圖分類(lèi)號(hào): TP391.4 文獻(xiàn)標(biāo)識(shí)碼: A DOI: 10.19358/j.issn.2096-5133.2021.12.002 引用格式: 易黎,邱秀連,,馬芳,,等. 涉及隱私侵占類(lèi)APP識(shí)別與分類(lèi)方法研究[J].信息技術(shù)與網(wǎng)絡(luò)安全,2021,,40(12):8-14.
Research on identification and classification methods of APP involving privacy infringement
Yi Li1,,Qiu Xiulian1,Ma Fang1,,Peng Yanbing1,,Cheng Guang2
(1.Nanjing FiberHome Software Technology Co.,Ltd.,,Nanjing 210019,,China; 2.School of Cyber Science and Engineering,,Southeast University,,Nanjing 211189,China)
Abstract: With the development of information infrastructure and the popularization of mobile applications, a large number of users′ personal information is collected by application developers in the process of use, and there are problems with the illegal collecting and using of personal information, which seriously threatens the security of personal information. In order to more effectively identify the type of APP and whether it has violated privacy, a recognition algorithm based on multi-modal features and multi-strategy combination is proposed. Firstly, the algorithm uses the Word2vec method to extract feature formation vectors related to APP text, and then the obtained feature vector is input into the CNN network for classification. Based on the result of the text classification and a variety of behavior feature sets, it generates application feature vectors, and finally combines a variety of different base classifiers and uses hard voting to predict the applications′ invade-privacy categories. The experimental result shows that the F1 value of the trained model on the validation set can be as high as 91%. This method can effectively identify and classify privacy-invading apps, which is helpful to ensure the security of personal information in the era of big data.
Key words : multi-label text classification,;feature extraction,;behavioral features;model construction,;machine learning