
當(dāng)前位置:主頁 > 文藝論文 > 廣告藝術(shù)論文 >


發(fā)布時間:2018-10-23 20:31
[Abstract]:Topic detection and tracking is to find the most discussed topic from the massive data and follow up the development and change of the topic in the follow-up information to solve the increasingly serious problem of information explosion for people. Topic detection and tracking can save user time, follow up the development of events, and provide data support for public opinion monitoring, which has important practical value and security significance. As more and more users use Weibo to publish information and discuss topics, hot topic display has gradually become an important function of Weibo platform. Because Weibo's immediacy is very strong, breaking news spreads very quickly on Weibo, and the number of users who participate in reporting, forwarding, and commenting on news events with great influence is also very large. It is often possible to react before the traditional news media. Therefore, according to the characteristics of Weibo, this paper designs and implements a method of tracking and detecting hot topics for Weibo by filtering invalid Weibo. The main work is as follows: 1) analyzing the characteristics of Weibo, filtering the invalid Weibo. Weibo user crowd is complex, covers a wide range, the difference is big, the content is complicated. By analyzing Weibo's user characteristics, including the number of users' fans and the number of users issuing Weibo daily, filtering advertising users and zombie users, analyzing the content of Weibo, filtering merchants' promotional activities, and sharing content with users, Weibo, who has no contribution to the topic, participated in a large number of activities such as user participation. By analyzing the Weibo data after the participle, he filtered too many words and too few words to remove meaningless and too short text, and repeated too many long texts. Effectively filter invalid Weibo, reduce the computational complexity. 2) designed and implemented the algorithm based on the time characteristics of Weibo hot topic detection. Weibo is processed in the order of increasing time, by improving the Single-Pass clustering algorithm, including the improvement of similarity calculation method, combining with the improvement of the topic vector updating method of user's influence, the preliminary topic detection is carried out, and the FP-Growth frequent itemset discovery algorithm is used. Mining frequent feature word sets, correcting errors of SP algorithm, clustering frequent feature words set with improved K-MEDOIDS algorithm, extracting final topic, The computational efficiency and the accuracy of topic detection are improved. 3) A multi-query vector adaptive topic tracking algorithm based on time characteristic is designed and implemented. On the basis of the distribution of Weibo's quantity in time dimension, Weibo is grouped according to the period of time and processed in the order of increasing time, and the similarity calculation between the topics of each time period and all the topics that already exist in all the topic groups is compared. According to the threshold selection, the topic vector is changed adaptively to the existing topic group or to create a new topic group. Tracking the status of topic development effectively improves the accuracy and reduces the topic drift.


相關(guān)期刊論文 前5條

1 周剛;鄒鴻程;熊小兵;黃永忠;;MB-SinglePass:基于組合相似度的微博話題檢測[J];計算機科學(xué);2012年10期

2 廉捷;周欣;曹偉;劉云;;新浪微博數(shù)據(jù)挖掘方案[J];清華大學(xué)學(xué)報(自然科學(xué)版);2011年10期

3 張輝;周敬民;王亮;趙莉萍;;基于三維文檔向量的自適應(yīng)話題追蹤器模型[J];中文信息學(xué)報;2010年05期

4 洪宇;張宇;劉挺;李生;;話題檢測與跟蹤的評測及研究綜述[J];中文信息學(xué)報;2007年06期

5 王會珍;朱靖波;季鐸;葉娜;張斌;;基于反饋學(xué)習(xí)自適應(yīng)的中文話題追蹤[J];中文信息學(xué)報;2006年03期





Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |
