中文字幕第1页麻豆精品一区二区综合av,国产成人精品一区二区三区影院

基于爬蟲和TFIDF-NB算法的微博情感分析

2021年電子技術(shù)應(yīng)用第4期

楊戈1,，2,，楊麓濤1

1.北京師范大學(xué)珠海分校智能多媒體技術(shù)重點(diǎn)實(shí)驗(yàn)室,，廣東珠海519087,； 2.北京大學(xué)深圳研究生院深圳物聯(lián)網(wǎng)智能感知技術(shù)工程實(shí)驗(yàn)室，廣東深圳518055

摘要： 針對微博網(wǎng)絡(luò)輿情信息量大,、無規(guī)則,、隨機(jī)變化的特點(diǎn)，提出TFIDF-NB(Term Frequency Inverse Document Frequency-Naive Bayes)用于微博情感分析,，設(shè)計(jì)與實(shí)現(xiàn)了一個(gè)基于Scrapy框架的微博評論爬蟲,，將某熱點(diǎn)事件的若干條微博評論進(jìn)行爬取并存進(jìn)數(shù)據(jù)庫，然后進(jìn)行文本分割,、LDA(Latent Dirichlet Allocation)主題聚類,，最后使用TFIDF-NB算法進(jìn)行情感分類。實(shí)驗(yàn)結(jié)果表明,，TFIDF-NB算法平均準(zhǔn)確率高于線性支持向量機(jī)算法和K近鄰算法,，在精確率和召回率方面高于K近鄰算法，具有較好的情感分類效果,。

關(guān)鍵詞： 微博輿情網(wǎng)絡(luò)爬蟲情感分類

中圖分類號： TN011,；TP391.41
文獻(xiàn)標(biāo)識碼： A
DOI：10.16157/j.issn.0258-7998.200748
中文引用格式： 楊戈，楊麓濤. 基于爬蟲和TFIDF-NB算法的微博情感分析[J].電子技術(shù)應(yīng)用,，2021,，47(4)：59-62，66.
英文引用格式： Yang Ge,，Yang Lutao. Sentiment analysis of Weibo based on TFIDF-NB algorithm[J]. Application of Electronic Technique,，2021,，47(4)：59-62，66.

Sentiment analysis of Weibo based on TFIDF-NB algorithm

Yang Ge1,，2,，Yang Lutao1

1.Key Laboratory of Intelligent Multimedia Technology，Beijing Normal University(Zhuhai Campus),，Zhuhai 519087,，China； 2.Engineering Lab on Intelligent Perception for Internet of Things(ELIP),，Shenzhen Graduate School，Peking University,， Shenzhen 518055,，China

Abstract： In view of the large amount of public opinion information on Weibo, irregular and random changes, this paper proposes a Weibo sentiment analysis method based on TFIDF-NB(Term Frequency Inverse Document Frequency-Naive Bayes) algorithm. By coding a Weibo comment crawler based on the Scrapy framework, several Weibo comments on a hot event are crawled and stored in the database. Then text segmentation and LDA(Latent Dirichlet Allocation) topic clustering are performed. And finally the TFIDF-NB algorithm is used for sentiment classification. Experimental results show that the accuracy of the algorithm is higher than that of the standard linear Support Vector Machine algorithm and the K-Nearest Neighbor algorithm, and it is higher than the K-Nearest Neighbor algorithm in terms of accuracy and recall, and it has a better effect on sentiment classification.

Key words : Weibo public opinion；web crawler,；sentiment classification

0 引言

網(wǎng)絡(luò)輿情是指網(wǎng)絡(luò)用戶對社會(huì)各方面熱點(diǎn)問題所發(fā)表的見解和建議的輿論,，是社會(huì)輿情的一種體現(xiàn)，是公眾對社會(huì)中各種熱點(diǎn)事件和問題所表達(dá)的態(tài)度,、想法,、情緒等的集合?；ヂ?lián)網(wǎng)的快速發(fā)展使得網(wǎng)絡(luò)輿情的形成和傳播速度不斷提升,，對社會(huì)的影響巨大。

文獻(xiàn)[1]證明了網(wǎng)絡(luò)輿情的發(fā)展具有混沌的特性,，即表現(xiàn)為亂序,、無規(guī)則、隨機(jī)變化,。在網(wǎng)絡(luò)輿情傳播的過程中,，微博給網(wǎng)絡(luò)輿情的形成、發(fā)酵和傳播提供了一個(gè)強(qiáng)大的互聯(lián)網(wǎng)平臺(tái),，給其用戶提供了一個(gè)向全世界分享信息,、發(fā)表評論和表達(dá)訴求的平臺(tái)，這些輿論內(nèi)容在短時(shí)間內(nèi)會(huì)大規(guī)模地?cái)U(kuò)散,，甚至?xí)绊懯录淖呦颉?/p>

本文首先實(shí)現(xiàn)一個(gè)基于Scrapy框架的微博評論爬蟲,，將某熱點(diǎn)事件的若干條微博評論進(jìn)行爬取并存進(jìn)數(shù)據(jù)庫，然后進(jìn)行文本分割和LDA(Latent Dirichlet Allocation)主題聚類,，最后采用TFIDF-NB(Term Frequency Inverse Document Frequency-Navie Bayes)算法進(jìn)行文本情感分類,。

(1)爬蟲

爬蟲全稱為網(wǎng)絡(luò)爬蟲，是一種可以對互聯(lián)網(wǎng)上的信息進(jìn)行自動(dòng)化瀏覽的網(wǎng)絡(luò)腳本或程序,，可實(shí)現(xiàn)對海量互聯(lián)網(wǎng)信息進(jìn)行瀏覽,、爬取等操作,，并將抓取到的信息存儲(chǔ)于本地中。

網(wǎng)絡(luò)爬蟲可以分為4種^[2]：通用網(wǎng)絡(luò)爬蟲^[3],、主題網(wǎng)絡(luò)爬蟲^[4],、增量式網(wǎng)絡(luò)爬蟲^[5]、深層網(wǎng)絡(luò)爬蟲^[6-7],。

(2)情感分類

情感分析是指識別文本中潛在的想法,、情感和態(tài)度的方法^[8]。情感分類是情感分析的核心內(nèi)容,，情感分類的作用是識別文本數(shù)據(jù)中的觀點(diǎn),，對情感的積極或消極情緒進(jìn)行分類^[9]。

目前情感分類主要有兩種方法,，一種是基于詞典的方法^[10-13],，另一種是基于機(jī)器學(xué)習(xí)的方法^[14-16]。

本文詳細(xì)內(nèi)容請下載:http://forexkbc.com/resource/share/2000003464

作者信息:

楊戈1,，2,，楊麓濤1

(1.北京師范大學(xué)珠海分校智能多媒體技術(shù)重點(diǎn)實(shí)驗(yàn)室，廣東珠海519087,；

2.北京大學(xué)深圳研究生院深圳物聯(lián)網(wǎng)智能感知技術(shù)工程實(shí)驗(yàn)室,，廣東深圳518055)

原創(chuàng)聲明：此內(nèi)容為AET網(wǎng)站原創(chuàng)，未經(jīng)授權(quán)禁止轉(zhuǎn)載,。

相關(guān)內(nèi)容