微博惡意用戶識別方法的研究

發(fā)布時間：2018-04-05 18:34

本文選題：微博　切入點：惡意用戶　出處：《北京交通大學》2017年碩士論文

【摘要】：隨著互聯(lián)網(wǎng)的飛速發(fā)展,以Twitter、Facebook為代表的社交網(wǎng)絡也得到了迅猛發(fā)展,社交網(wǎng)絡逐漸成為現(xiàn)代人生活中不可或缺的一部分。在國內(nèi),最具代表性的社交網(wǎng)絡是微博,它扮演的角色早已超越單純的社交,已然成為一個信息的集中擴散中心。同時,微博被惡意用戶所利用。這些用戶以龐大的數(shù)量傳播著虛假信息、惡意信息,影響人們對事件的看法。因此,對反惡意用戶的研究具有重要的現(xiàn)實意義,其中惡意用戶識別技術就是一個重要的研究熱點。本論文以新浪微博用戶為對象,重點研究微博網(wǎng)絡中惡意用戶識別的問題。論文的研究工作得到了國家自然科學基金項目(No.61271308、61172072、61401015)與北京市教育委員會研究生學科建設項目的支持論文的主要工作包括:論文從惡意用戶特征入手,依據(jù)微博的功能特性以及用戶的使用習慣,分析并發(fā)現(xiàn)了對于微博中的"收藏"功能,惡意用戶與正常用戶的使用習慣有著較大的差別。因此,本文將"收藏數(shù)量"及"收藏速度"加入到特征列表,驗證其對于惡意用戶識別效果的貢獻度。論文使用Weka Java API對Weka中的算法進行調(diào)用及參數(shù)調(diào)優(yōu),針對用戶信息缺失的情況,分別對比了樸素貝葉斯算法、C4.5決策樹、隨機森林三種算法在處理缺失數(shù)據(jù)前后的分類效果。分析對比得出的結論是:在數(shù)據(jù)存在缺失的情況下,C4.5決策樹與隨機森林算法都有較好的魯棒性,尤其是隨機森林算法效果更佳。論文還對實際的使用情況進行了模擬實現(xiàn),研究了在需要處理較大規(guī)模的數(shù)據(jù)時如何提高惡意用戶識別算法的效率。通過部署Hadoop分布式架構,分別對比了不同節(jié)點數(shù)對不同大小數(shù)據(jù)集的處理時間,及惡意用戶的識別效果。論文從用戶特征的角度分析惡意用戶與正常用戶的差異,并根據(jù)這些特征選取合適的分類算法對惡意用戶進行識別,識別準確率接近90%。
[Abstract]:With the rapid development of the Internet, social networks, such as Twitter and Facebook, have also developed rapidly, and social networks have gradually become an integral part of modern life.In China, Weibo is the most representative social network.At the same time, Weibo was used by malicious users.These users spread false information and malicious information in a large number to influence people's views on events.Therefore, the research on anti-malicious users has important practical significance, among which malicious user identification technology is an important research hotspot.This paper focuses on the problem of malicious user identification in Weibo network.The research work of the thesis has been supported by the National Natural Science Foundation Project No. 61271308FU 61172072Pu 61401015) and the main work of this thesis is as follows: the thesis starts with the characteristics of malicious users.According to Weibo's functional characteristics and user's usage habits, the author analyzes and finds out that there are great differences between malicious users and normal users' usage habits for the "collection" function in Weibo.Therefore, this paper adds "collection quantity" and "collection speed" to the feature list to verify its contribution to malicious user identification.In this paper, Weka Java API is used to call and tune the parameters of the algorithm in Weka. Aiming at the lack of user information, the classification effects of the naive Bayesian algorithm C4.5 decision tree and the random forest algorithm before and after processing the missing data are compared respectively.The conclusion of analysis and comparison is that C4.5 decision tree and stochastic forest algorithm have better robustness, especially the effect of stochastic forest algorithm is better.The paper also simulates the actual usage and studies how to improve the efficiency of malicious user identification algorithm when dealing with large scale data.By deploying Hadoop distributed architecture, the processing time of different node points to different size data sets and the effect of malicious user identification are compared.This paper analyzes the differences between malicious users and normal users from the point of view of user characteristics, and selects appropriate classification algorithms according to these features to identify malicious users, and the recognition accuracy is close to 90%.
【學位授予單位】：北京交通大學
【學位級別】：碩士
【學位授予年份】：2017
【分類號】：TP18;TP393.092

【相似文獻】

相關期刊論文前10條

1 梁萬榮;;載波技術在臺區(qū)用戶識別中的應用[J];農(nóng)村電氣化;2007年03期

2 志宏;用戶識別與信息驗證的一種方法[J];通信保密;1983年01期

3 傅山鋮;李燕寅;趙振煜;王詩云;;關于空巢老人手機用戶識別的方法初探[J];電子世界;2014年01期

4 ;如何解除手機自鎖[J];家庭科技;2000年09期

5 ;手機自鎖如何解[J];廣西質(zhì)量監(jiān)督導報;2001年04期

6 文堯;;偷懶也可做慈善[J];成功營銷;2011年11期

7 葉娜;趙銀亮;邊根慶;李健;何箐;;模式無關的社交網(wǎng)絡用戶識別算法[J];西安交通大學學報;2013年12期

8 ;貝加萊便于用戶識別的RFID技術[J];自動化博覽;2010年10期

9 李煊,莊鎮(zhèn)泉;Web訪問挖掘預處理的用戶識別算法[J];計算機工程與應用;2002年07期

10 湯偉;黃培磊;陳璐藝;林祥;;基于行為分析的Web日志用戶識別算法[J];軟件產(chǎn)業(yè)與工程;2013年06期

相關會議論文前2條

1 童建剛;;計算機用戶識別技術[A];第三次全國計算機安全技術交流會論文集[C];1988年

2 孫偉;周燦;徐春虎;房晨婕;張超;李占先;嚴純?nèi)A;;構筑具有用戶識別能力的分子計算平臺[A];中國化學會第26屆學術年會無機與配位化學分會場論文集[C];2008年

相關重要報紙文章前2條

1 趙慧玲吳江;VPN放心用[N];中國計算機報;2001年

2 劉春輝;加強“標識的唯一性”研究為下一代網(wǎng)服務[N];人民郵電;2006年

相關博士學位論文前2條

1 劉士喜;社會網(wǎng)絡環(huán)境下基于信任關系的影響用戶識別方法研究[D];合肥工業(yè)大學;2016年

2 李楠;軟件產(chǎn)品創(chuàng)新中的領先用戶識別研究[D];東北財經(jīng)大學;2012年

相關碩士學位論文前9條

1 李自豪;微博惡意用戶識別方法的研究[D];北京交通大學;2017年

2 魏聰;互聯(lián)網(wǎng)訪問數(shù)據(jù)用戶識別與興趣度分析[D];東華大學;2015年

3 沈昌干;運營商數(shù)據(jù)管理平臺中的獨立用戶識別研究[D];東華大學;2015年

4 趙建勛;微博惡意用戶識別[D];北京交通大學;2016年

5 陳媛媛;移動通信系統(tǒng)中校園用戶精確識別與細分研究[D];重慶大學;2010年

6 何榮華;智能云電視的用戶識別系統(tǒng)設計與實現(xiàn)[D];大連理工大學;2014年

7 李桐;消費類軟件產(chǎn)品的領先用戶識別研究[D];東北財經(jīng)大學;2013年

8 李福明;基于海量信令數(shù)據(jù)的服務業(yè)線上活躍用戶識別系統(tǒng)的設計與實現(xiàn)[D];北京郵電大學;2015年

9 李麗欣;微博群體網(wǎng)絡結構及其核心用戶識別[D];哈爾濱工業(yè)大學;2014年

，

本文編號：1715962

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://www.lk138.cn/shoufeilunwen/xixikjs/1715962.html

上一篇：汽車工廠MES系統(tǒng)設計和實現(xiàn)
下一篇：水平井波動壓力計算理論建模及軟件開發(fā)

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

中国韩国日本在线观看免费,A级尤物一区,日韩精品一二三区无码,欧美日韩少妇色

微博惡意用戶識別方法的研究