新聞熱點(diǎn)話題發(fā)現(xiàn)及演化分析研究與應(yīng)用
本文選題:LDA模型 切入點(diǎn):熱點(diǎn)話題發(fā)現(xiàn) 出處:《南京理工大學(xué)》2017年碩士論文 論文類型:學(xué)位論文
【摘要】:熱點(diǎn)話題是因網(wǎng)絡(luò)報(bào)道而引起人們廣泛關(guān)注的話題,熱點(diǎn)話題發(fā)現(xiàn)與演化研究有利于社會(huì)大眾知曉當(dāng)前輿論焦點(diǎn)和政府進(jìn)行良性輿論引導(dǎo),能夠防止有心之徒利用網(wǎng)絡(luò)的便捷性、不可控性牟取不正當(dāng)利益,制造社會(huì)矛盾。本文主要就新聞熱點(diǎn)話題發(fā)現(xiàn)及對(duì)熱點(diǎn)話題演化偏移過程進(jìn)行研究,主要包括以下幾個(gè)方面:1、引入了 LDA主題模型,對(duì)新聞報(bào)道采用基于TF-IDF的詞-權(quán)值模型和基于語義理解的LDA模型兩種文本向量建模方式。在此基礎(chǔ)上,針對(duì)傳統(tǒng)單核心話題描述模型對(duì)多核話題描述欠缺的問題,提出了一種多核心話題描述模型,能夠識(shí)別同一話題下不同的關(guān)注核心,并給出了模型構(gòu)造方法:采用劃分聚類與層次聚類結(jié)合的方法對(duì)新聞報(bào)道進(jìn)行精確聚類。實(shí)驗(yàn)表明,多種文本向量建模相結(jié)合的方式以及多核心話題描述模型能夠提高新聞話題的聚類效果。2、根據(jù)熱點(diǎn)話題特征分析的結(jié)果,將新聞的熱度量化為媒體報(bào)道熱度和網(wǎng)民關(guān)注熱度,并采用基于兩者的復(fù)合關(guān)注度描述熱點(diǎn)話題的熱度;同時(shí)引入"話題指數(shù)",采用基于時(shí)間窗口的分段話題聚類方法對(duì)熱點(diǎn)話題生命周期演化過程進(jìn)行分析,提出了一種基于多核心話題描述模型的話題演化偏移分析方法,將演化過程看成話題內(nèi)核心事件的轉(zhuǎn)移過程。實(shí)驗(yàn)表明該方法能很好的發(fā)現(xiàn)熱點(diǎn)話題的演化偏移過程。3、基于上述研究成果,設(shè)計(jì)并實(shí)現(xiàn)了新聞熱點(diǎn)話題發(fā)現(xiàn)及演化分析子系統(tǒng),該子系統(tǒng)是移動(dòng)新聞監(jiān)測和分析平臺(tái)的一個(gè)重要功能模塊,集成了新聞報(bào)道預(yù)處理、熱點(diǎn)話題發(fā)現(xiàn)、熱點(diǎn)話題演化分析等功能,能夠?qū)崟r(shí)發(fā)現(xiàn)當(dāng)前熱點(diǎn)話題并展示給用戶。
[Abstract]:Hot topic is the topic that people pay much attention to because of network report. The research of hot topic discovery and evolution is helpful for the public to know the current public opinion focus and the government to guide public opinion. It can prevent those who want to make use of the convenience of the network, can not be controlled to obtain improper interests, and create social contradictions. This paper mainly focuses on the discovery of hot topics in news and the process of migration of the evolution of hot topics. It mainly includes the following several aspects: 1, introduces the LDA topic model, adopts two text vector modeling methods for news reports: word-weight model based on TF-IDF and LDA model based on semantic understanding. Aiming at the lack of multi-core topic description model in traditional single-core topic description model, a multi-core topic description model is proposed, which can identify different cores of concern under the same topic. The method of model construction is given. The method of combining partitioning clustering with hierarchical clustering is used to accurately cluster news reports. The combination of multiple text vector modeling and multi-core topic description model can improve the clustering effect of news topics. According to the results of feature analysis of hot topics, the heat of news can be quantified as the heat of media reports and the attention of Internet users. The heat of the hot topic is described by using the composite concern degree based on both, and the topic index is introduced to analyze the evolution process of the life cycle of the hot topic by using the segmented topic clustering method based on the time window. A topic evolution migration analysis method based on multi-core topic description model is proposed. The evolution process is regarded as the transition process of the core events in the topic. The experiment shows that the method can find the evolution migration process of the hot topic very well. Based on the above research results, the subsystem of news hot topic discovery and evolution analysis is designed and implemented. This subsystem is an important function module of mobile news monitoring and analysis platform. It integrates the functions of news report preprocessing, hot topic discovery, hot topic evolution analysis and so on. It can discover the current hot topic in real time and display it to the user.
【學(xué)位授予單位】:南京理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 江華麗;;中文分詞算法研究與分析[J];物聯(lián)網(wǎng)技術(shù);2016年01期
2 李鳳嶺;朱保平;;基于LDA模型的微博話題發(fā)現(xiàn)技術(shù)研究[J];計(jì)算機(jī)應(yīng)用與軟件;2014年10期
3 鄒曉輝;孫靜;;LDA主題模型[J];智能計(jì)算機(jī)與應(yīng)用;2014年05期
4 李愛華;尹斐斐;;網(wǎng)格聚類算法研究[J];科技致富向?qū)?2012年23期
5 張小明;李舟軍;巢文涵;;基于增量型聚類的自動(dòng)話題檢測研究[J];軟件學(xué)報(bào);2012年06期
6 彭菲菲;錢旭;;基于用戶關(guān)注度的個(gè)性化新聞推薦系統(tǒng)[J];計(jì)算機(jī)應(yīng)用研究;2012年03期
7 徐戈;王厚峰;;自然語言處理中主題模型的發(fā)展[J];計(jì)算機(jī)學(xué)報(bào);2011年08期
8 姚全珠;宋志理;彭程;;基于LDA模型的文本分類研究[J];計(jì)算機(jī)工程與應(yīng)用;2011年13期
9 姚宗靜;余強(qiáng);;Dirichlet分布概率密度的導(dǎo)出及若干性質(zhì)[J];科技信息;2010年11期
10 黃曉斌;趙超;;文本挖掘在網(wǎng)絡(luò)輿情信息分析中的應(yīng)用[J];情報(bào)科學(xué);2009年01期
,本文編號(hào):1609233
本文鏈接:http://www.lk138.cn/shoufeilunwen/xixikjs/1609233.html