跨社交網(wǎng)絡(luò)的同一用戶識別算法-AET-電子技術(shù)應(yīng)用

跨社交網(wǎng)絡(luò)的同一用戶識別算法

2022年電子技術(shù)應(yīng)用第1期

沈佳琪1，周國民2

1.浙江工業(yè)大學(xué) 信息工程學(xué)院，浙江杭州310023；2.浙江警察學(xué)院計算機與信息技術(shù)系，浙江杭州310053

摘要： 針對跨社交網(wǎng)絡(luò)的同一用戶識別問題，提出了一種綜合用戶興趣、寫作風(fēng)格和檔案屬性的識別方法。通過在這3種不同的特征維度下分別判定用戶關(guān)系，然后綜合判定結(jié)果，提高同一用戶識別準(zhǔn)確性。其中，用戶興趣分為靜態(tài)興趣和動態(tài)興趣，靜態(tài)興趣采用TextRank算法從用戶背景信息中提取，動態(tài)興趣則利用主題模型從用戶發(fā)表的文本內(nèi)容中挖掘出隨時間變化的興趣點。對于用戶寫作風(fēng)格則通過One-Class SVM算法進行識別，最后利用信息熵賦權(quán)法比較用戶檔案屬性相似度。實驗結(jié)果表明，與傳統(tǒng)機器學(xué)習(xí)算法相比，所提算法精確率、召回率均有所提升。

關(guān)鍵詞： 跨社交網(wǎng)絡(luò) 用戶識別用戶興趣寫作風(fēng)格檔案屬性

中圖分類號： TN01；TP391
文獻標(biāo)識碼： A
DOI：10.16157/j.issn.0258-7998.211518
中文引用格式： 沈佳琪，周國民. 跨社交網(wǎng)絡(luò)的同一用戶識別算法[J].電子技術(shù)應(yīng)用，2022，48(1)：109-114.
英文引用格式： Shen Jiaqi，Zhou Guomin. User alignment across social networks[J]. Application of Electronic Technique，2022，48(1)：109-114.

User alignment across social networks

Shen Jiaqi1，Zhou Guomin2

1.College of Information Engineering，Zhejiang University of Technology，Hangzhou 310023，China； 2.Department of Computer and Information Security，Zhejiang Police College，Hangzhou 310053，China

Abstract： For the problem of identifying the same user across social networks, a recognition method that integrates user interests, writing style and profile attributes is proposed. By determining user relationships under these three different feature dimensions separately, and then synthesizing the results, the same user identification accuracy is improved. Among them, user interest is divided into static interest and dynamic interest, static interest is extracted from user background information by TextRank algorithm, while dynamic interest is mined from user published text content by using topic model to find out interest points that change over time. For user writing style, it is identified by One-Class SVM algorithm, and finally, the information entropy empowerment method is used to compare the similarity of user profile attributes. The experimental results show that the proposed algorithm has improved accuracy and recall rate compared with traditional machine learning algorithms.

Key words : across social networks；users identification；user interest；writing style；file attribute

0 引言

近年來，個人信息數(shù)據(jù)隨社交網(wǎng)絡(luò)的普及變得越來越豐富，目前對社交網(wǎng)絡(luò)的用戶分析主要針對單一平臺，但由于單一平臺數(shù)據(jù)存在局限性^[1]，因此可通過挖掘同一用戶在不同社交網(wǎng)絡(luò)中的多個賬號，為社交網(wǎng)絡(luò)分析提供數(shù)據(jù)支撐^[2]。

基于用戶檔案屬性的識別方式是研究最廣的方法。Zafarani等^[3]通過比較用戶選取用戶名的行為特征相似度判斷是否為同一用戶；Zhang等^[4]結(jié)合用戶名、頭像等多個屬性，利用樸素貝葉斯進行識別。然而上述研究中的特征容易缺失和偽造^[5]。因此，一些研究從發(fā)表的文本內(nèi)容入手，挖掘用戶興趣，比較興趣相似度來判定用戶關(guān)系^[6]。何力等^[7]采用LDA模型來挖掘文本內(nèi)容中的用戶興趣；呂志泉等^[8]在LDA模型的基礎(chǔ)上引入了時間因子。但上述研究僅考慮了文本內(nèi)容體現(xiàn)的動態(tài)興趣，沒有結(jié)合靜態(tài)興趣，同時，即使是同一用戶，在不同社交平臺關(guān)注和發(fā)表的內(nèi)容也可能有較大差別，這影響了用戶識別效果。

本文詳細內(nèi)容請下載：http://forexkbc.com/resource/share/2000003919。

作者信息：

沈佳琪1，周國民2

(1.浙江工業(yè)大學(xué) 信息工程學(xué)院，浙江杭州310023；2.浙江警察學(xué)院計算機與信息技術(shù)系，浙江杭州310053)

原創(chuàng)聲明：此內(nèi)容為AET網(wǎng)站原創(chuàng)，未經(jīng)授權(quán)禁止轉(zhuǎn)載。

相關(guān)內(nèi)容