中国韩国日本在线观看免费,A级尤物一区,日韩精品一二三区无码,欧美日韩少妇色

當(dāng)前位置:主頁 > 文藝論文 > 廣告藝術(shù)論文 >

微博熱點話題檢測與跟蹤技術(shù)研究

發(fā)布時間:2018-10-23 20:31
【摘要】:話題檢測與跟蹤是指從海量數(shù)據(jù)中發(fā)現(xiàn)被最多討論的話題并在后續(xù)信息中跟進(jìn)話題的發(fā)展變化狀態(tài),為人們解決愈發(fā)嚴(yán)重的信息爆炸問題。話題檢測與跟蹤可以節(jié)省用戶時間,跟進(jìn)事件發(fā)展動態(tài);為輿情監(jiān)控提供數(shù)據(jù)支持,有重要的實際價值和安全意義。隨著越來越多的用戶使用微博進(jìn)行信息發(fā)布和話題討論,熱點話題展示也逐漸變成微博平臺的一個重要功能。由于微博的即時性很強,突發(fā)新聞在微博上的傳播速度很快,而且對于影響力較大的新聞事件,參與報道、轉(zhuǎn)發(fā)、評論的用戶數(shù)量也很大,往往能夠先于傳統(tǒng)新聞媒體做出反應(yīng)。因此,針對微博的特點,本文通過過濾無效微博,設(shè)計并實現(xiàn)了一種針對微博的熱點話題跟蹤及檢測方法,主要工作如下:1)分析了微博特性,過濾了無效微博。微博用戶人群復(fù)雜,涵蓋范圍廣,差別大,內(nèi)容駁雜。通過分析微博用戶特征,包括用戶粉絲數(shù)與用戶每日發(fā)布微博數(shù),過濾廣告用戶與僵尸用戶;通過分析微博內(nèi)容,過濾商家推廣活動,與用戶分享內(nèi)容,用戶參與的活動等大量對話題無貢獻(xiàn)的微博;通過分析分詞后的微博數(shù)據(jù),過濾包含詞數(shù)過多和過少的微博,去除無意義的過短文本,和重復(fù)過多的過長文本,有效過濾無效微博,降低計算復(fù)雜度。2)設(shè)計并實現(xiàn)了基于時間特性的微博熱點話題檢測算法。將微博按時間遞增順序處理,通過改進(jìn)Single-Pass聚類算法,包括相似度計算方法的改進(jìn),結(jié)合用戶影響力的話題向量更新方法的改進(jìn),進(jìn)行初步話題檢測;利用FP-Growth頻繁項集發(fā)現(xiàn)算法,挖掘頻繁特征詞集,修正SP算法的錯誤;利用改進(jìn)的K-MEDOIDS算法對頻繁特征詞集進(jìn)行聚類,抽取最終話題,提高了計算效率與話題檢測的準(zhǔn)確率。3)設(shè)計并實現(xiàn)了基于時間特性的多查詢向量自適應(yīng)話題跟蹤算法;谖⒉⿺(shù)量在時間維度上的分布特征,將微博按時段分組,并按時間遞增順序處理;將每個時段的話題與已存在所有話題組的所有話題進(jìn)行相似度計算對比,根據(jù)閾值選擇將其歸入已存在話題組或創(chuàng)建新的話題組,自適應(yīng)更改加入話題組的話題向量。有效的跟蹤話題發(fā)展?fàn)顟B(tài),提高了準(zhǔn)確率,減少了話題漂移。
[Abstract]:Topic detection and tracking is to find the most discussed topic from the massive data and follow up the development and change of the topic in the follow-up information to solve the increasingly serious problem of information explosion for people. Topic detection and tracking can save user time, follow up the development of events, and provide data support for public opinion monitoring, which has important practical value and security significance. As more and more users use Weibo to publish information and discuss topics, hot topic display has gradually become an important function of Weibo platform. Because Weibo's immediacy is very strong, breaking news spreads very quickly on Weibo, and the number of users who participate in reporting, forwarding, and commenting on news events with great influence is also very large. It is often possible to react before the traditional news media. Therefore, according to the characteristics of Weibo, this paper designs and implements a method of tracking and detecting hot topics for Weibo by filtering invalid Weibo. The main work is as follows: 1) analyzing the characteristics of Weibo, filtering the invalid Weibo. Weibo user crowd is complex, covers a wide range, the difference is big, the content is complicated. By analyzing Weibo's user characteristics, including the number of users' fans and the number of users issuing Weibo daily, filtering advertising users and zombie users, analyzing the content of Weibo, filtering merchants' promotional activities, and sharing content with users, Weibo, who has no contribution to the topic, participated in a large number of activities such as user participation. By analyzing the Weibo data after the participle, he filtered too many words and too few words to remove meaningless and too short text, and repeated too many long texts. Effectively filter invalid Weibo, reduce the computational complexity. 2) designed and implemented the algorithm based on the time characteristics of Weibo hot topic detection. Weibo is processed in the order of increasing time, by improving the Single-Pass clustering algorithm, including the improvement of similarity calculation method, combining with the improvement of the topic vector updating method of user's influence, the preliminary topic detection is carried out, and the FP-Growth frequent itemset discovery algorithm is used. Mining frequent feature word sets, correcting errors of SP algorithm, clustering frequent feature words set with improved K-MEDOIDS algorithm, extracting final topic, The computational efficiency and the accuracy of topic detection are improved. 3) A multi-query vector adaptive topic tracking algorithm based on time characteristic is designed and implemented. On the basis of the distribution of Weibo's quantity in time dimension, Weibo is grouped according to the period of time and processed in the order of increasing time, and the similarity calculation between the topics of each time period and all the topics that already exist in all the topic groups is compared. According to the threshold selection, the topic vector is changed adaptively to the existing topic group or to create a new topic group. Tracking the status of topic development effectively improves the accuracy and reduces the topic drift.
【學(xué)位授予單位】:東南大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:TP391.1;TP393.092

【參考文獻(xiàn)】

相關(guān)期刊論文 前5條

1 周剛;鄒鴻程;熊小兵;黃永忠;;MB-SinglePass:基于組合相似度的微博話題檢測[J];計算機科學(xué);2012年10期

2 廉捷;周欣;曹偉;劉云;;新浪微博數(shù)據(jù)挖掘方案[J];清華大學(xué)學(xué)報(自然科學(xué)版);2011年10期

3 張輝;周敬民;王亮;趙莉萍;;基于三維文檔向量的自適應(yīng)話題追蹤器模型[J];中文信息學(xué)報;2010年05期

4 洪宇;張宇;劉挺;李生;;話題檢測與跟蹤的評測及研究綜述[J];中文信息學(xué)報;2007年06期

5 王會珍;朱靖波;季鐸;葉娜;張斌;;基于反饋學(xué)習(xí)自適應(yīng)的中文話題追蹤[J];中文信息學(xué)報;2006年03期

,

本文編號:2290384

資料下載
論文發(fā)表

本文鏈接:http://www.lk138.cn/wenyilunwen/guanggaoshejilunwen/2290384.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶4e630***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com