當(dāng)前位置：主頁(yè) > 碩博論文 > 基礎(chǔ)科學(xué)碩士論文 >

基于節(jié)點(diǎn)相似度和鏈接次數(shù)組合時(shí)序的鏈接預(yù)測(cè)算法

發(fā)布時(shí)間：2018-04-23 03:37

本文選題：鏈接預(yù)測(cè) + 節(jié)點(diǎn)相似度　；參考：《吉林大學(xué)》2017年碩士論文

【摘要】：在網(wǎng)絡(luò)中,節(jié)點(diǎn)表示實(shí)體,鏈接表示它們之間的關(guān)系。隨著越來(lái)越多真實(shí)網(wǎng)絡(luò)數(shù)據(jù)的獲得,通過對(duì)網(wǎng)絡(luò)的分析來(lái)挖掘一些有價(jià)值的規(guī)律成為研究熱點(diǎn)。作為鏈接挖掘最重要的問題之一,鏈接預(yù)測(cè),即根據(jù)觀察到的節(jié)點(diǎn)和鏈接信息,來(lái)估計(jì)兩個(gè)節(jié)點(diǎn)之間存在鏈接的可能性。在眾多應(yīng)用的推動(dòng)下,鏈接預(yù)測(cè)的研究取得了豐碩的成果。目前采用較為廣泛的是基于節(jié)點(diǎn)相似度的方法,通過相似性分?jǐn)?shù)的大小,預(yù)測(cè)產(chǎn)生鏈接的可能性。相似度的計(jì)算主要包括基于網(wǎng)絡(luò)拓?fù)涞姆椒ㄅc基于節(jié)點(diǎn)屬性的方法,此外社區(qū)信息也被證明有助于鏈接預(yù)測(cè)。上述靜態(tài)鏈接預(yù)測(cè)方法曾在某些領(lǐng)域取得了不錯(cuò)的效果,但是在現(xiàn)實(shí)世界中,網(wǎng)絡(luò)往往是動(dòng)態(tài)變化的。靜態(tài)方法里,網(wǎng)絡(luò)隨時(shí)間的變化被忽視,如果只采用最近一個(gè)時(shí)間快照下的網(wǎng)絡(luò)圖,網(wǎng)絡(luò)變化比較頻繁時(shí),預(yù)測(cè)效果就會(huì)急劇下降;如果把歷史上各時(shí)間快照下的網(wǎng)絡(luò)圖疊加,則不能用于鏈接重復(fù)發(fā)生的情況,如電話、郵件等通信鏈接。隨著互聯(lián)網(wǎng)的發(fā)展,鏈接重復(fù)發(fā)生的場(chǎng)景越來(lái)越廣泛,網(wǎng)絡(luò)的演化越來(lái)越普遍,靜態(tài)鏈接預(yù)測(cè)方法已遠(yuǎn)遠(yuǎn)不能適應(yīng)新形勢(shì)下的需求,因此,近些年時(shí)間信息逐漸得到重視。目前,針對(duì)鏈接預(yù)測(cè)問題的研究,主要有兩個(gè)方向,一個(gè)方向是繼續(xù)完善靜態(tài)方法,充分提取當(dāng)前觀察到的有用網(wǎng)絡(luò)信息,包括拓?fù)湫畔ⅰ傩孕畔�、社區(qū)信息等;另一個(gè)方向是給空間結(jié)構(gòu)加上時(shí)間維度,考慮如何利用網(wǎng)絡(luò)隨時(shí)間的變化,更好地完成預(yù)測(cè)。時(shí)間序列在描述時(shí)間信息上取得了較好的效果,將歷史上各時(shí)間段的網(wǎng)絡(luò)信息表示為離散的時(shí)間序列圖,并進(jìn)行鏈接預(yù)測(cè),主要有兩種方式。一種是節(jié)點(diǎn)間鏈接次數(shù)的時(shí)間序列,僅僅根據(jù)節(jié)點(diǎn)間過去的鏈接次數(shù)預(yù)測(cè)未來(lái)的鏈接情況,取得了與靜態(tài)方法類似的結(jié)果,將其與靜態(tài)方法相結(jié)合,能進(jìn)一步提高預(yù)測(cè)效果。這種方法的優(yōu)勢(shì)在于,考慮鏈接歷史上出現(xiàn)的次數(shù)而不是是否出現(xiàn),時(shí)間序列模型較好地利用了鏈接的變化情況及最近時(shí)間的鏈接信息。同時(shí),混合模型將靜態(tài)方法預(yù)測(cè)新鏈接的能力與時(shí)間序列預(yù)測(cè)重復(fù)鏈接的能力結(jié)合起來(lái),是一種較為全面的方法。這種方法存在的問題是,對(duì)于新鏈接,由于失去了鏈接次數(shù)時(shí)間序列,混合模型就降級(jí)成了靜態(tài)相似度方法;此外,混合模型將最終的靜態(tài)方法預(yù)測(cè)值與時(shí)間序列預(yù)測(cè)值相乘,難以描述每個(gè)時(shí)間段的網(wǎng)絡(luò)信息。另一種時(shí)間序列方法做出了改進(jìn),采用節(jié)點(diǎn)相似性分?jǐn)?shù)的時(shí)間序列,根據(jù)節(jié)點(diǎn)間歷史上各時(shí)間段的相似性分?jǐn)?shù),預(yù)測(cè)未來(lái)的相似性,從而預(yù)測(cè)鏈接情況。這種方法也嘗試將節(jié)點(diǎn)間通過整個(gè)網(wǎng)絡(luò)計(jì)算的相似性分?jǐn)?shù)與節(jié)點(diǎn)間真正發(fā)生的鏈接次數(shù)結(jié)合,混合模型將每個(gè)時(shí)間段的相似性分?jǐn)?shù)與鏈接次數(shù)歸一化后相加,以此作為時(shí)間序列的輸入,然后以一元時(shí)間序列預(yù)測(cè)值作為未來(lái)鏈接發(fā)生的分?jǐn)?shù)。但是,由于模型過于簡(jiǎn)單未能描述相似性分?jǐn)?shù)與鏈接次數(shù)的關(guān)系,兩者的變化規(guī)律不同,混合模型得到的結(jié)果反而不如僅僅采用相似性分?jǐn)?shù)時(shí)間序列。針對(duì)以上不足,本文提出了一種新的基于節(jié)點(diǎn)相似度和鏈接次數(shù)組合時(shí)序的鏈接預(yù)測(cè)方法SOTS(Similarities and Occurrences Time Series)。首先通過有趨向的隨機(jī)游走,計(jì)算歷史各時(shí)間段節(jié)點(diǎn)間的相似性分?jǐn)?shù),然后采用時(shí)間序列模型將其與各時(shí)間段節(jié)點(diǎn)間的實(shí)際鏈接次數(shù)組合起來(lái),預(yù)測(cè)下個(gè)時(shí)間段各節(jié)點(diǎn)對(duì)發(fā)生鏈接的可能性。通過兩種組合時(shí)間序列模型,本文研究了節(jié)點(diǎn)間相似性分?jǐn)?shù)與實(shí)際鏈接次數(shù)的關(guān)系。該方法能夠用于演化網(wǎng)絡(luò)中未來(lái)新的鏈接以及重復(fù)出現(xiàn)鏈接的預(yù)測(cè)。本文貢獻(xiàn)如下:(1)采用一種新的方法將屬性社區(qū)與網(wǎng)絡(luò)拓?fù)浣M合起來(lái),計(jì)算相似性分?jǐn)?shù)。(2)研究了鏈接的形成與相似性分?jǐn)?shù)的關(guān)系。(3)將相似性分?jǐn)?shù)與鏈接次數(shù)有機(jī)結(jié)合,充分提取每個(gè)時(shí)間段的信息。尤其是二元時(shí)間序列模型的結(jié)合方式,有效描述了二者隨時(shí)間的協(xié)同演化。通過對(duì)時(shí)間序列與靜態(tài)信息的分析,我們將網(wǎng)絡(luò)結(jié)構(gòu)的時(shí)間、空間兩個(gè)維度結(jié)合起來(lái)。該方法比傳統(tǒng)方法利用的網(wǎng)絡(luò)信息更加全面,模型更加有效,經(jīng)過詳實(shí)的實(shí)驗(yàn),本文在中文DBLP數(shù)據(jù)集上評(píng)價(jià)了該方法與前人方法的預(yù)測(cè)效果,實(shí)驗(yàn)證明,該方法提高了約15%的預(yù)測(cè)準(zhǔn)確度,達(dá)到了本文工作的預(yù)期目標(biāo)。
[Abstract]:In the network, nodes represent entities and links represent their relationship. With the acquisition of more and more real network data, mining some valuable rules through the analysis of the network becomes a hot topic. As one of the most important issues of link mining, link prediction, which is estimated by the observed nodes and link information, is estimated to be two The possibility of links between nodes. With the promotion of many applications, the research of link prediction has achieved fruitful results. At present, the widely used method based on node similarity is to predict the possibility of linking by the size of similarity scores. The calculation of similarity degree mainly includes the method based on network topology and the method of similarity calculation. In addition, community information is also proved to be helpful to link prediction. The above static link prediction method has achieved good results in some fields, but in the real world, the network is often dynamic. In static methods, the network is neglected with time, if only the last time snapshot is used. If the network changes are more frequent and the network changes are more frequent, the prediction effect will fall sharply. If the network graph under the fast photos of all the times in history is superimposed, it can not be used for the repetition of links, such as telephone, mail and other communication links. With the development of the Internet, the scenes of heavy links are becoming more and more extensive and the evolution of the network becomes more and more more and more. Generally, the static link prediction method has been far from adapting to the needs of the new situation. Therefore, the time information has been paid more attention in recent years. At present, there are two main directions for the study of link prediction. One direction is to continue to improve the static method, to fully extract useful network information, including topological information, and attributes. Information, community information and so on; the other is to add time dimension to space structure, consider how to make use of the time changes of the network to better complete the prediction. The time series has achieved good results in describing time information, and the network information table of each time in history is shown as a discrete time series graph, and the link is predicted. There are two ways. One is the time sequence of the number of links between nodes. It is only based on the number of links between the nodes to predict the future link. The result is similar to the static method. The combination of the static method and the static method can further improve the prediction effect. The advantage of this method is to consider the times in the link history. It is a more comprehensive method to combine the ability of the static method to predict the ability of the new link and the ability of the time series to predict repeated links. The problem of this method is that it is new to the new model. Because of the loss of the time series of link times, the hybrid model is degraded into a static similarity method. In addition, the hybrid model multiplies the predicted value of the final static method with the time series prediction value, and it is difficult to describe the network information in each time period. The other time series method has made an improvement and uses the node similarity score. The sequence, according to the similarity score of each time interval between the nodes, predicts the future similarity and predicts the link situation. This method also tries to combine the similarity scores of the whole network with the number of real links between nodes. After normalization, it is used as the input of the time series, and then the prediction value of a single time series is used as a fraction of the future link. However, because the model is too simple to describe the relationship between the similarity scores and the number of links, the change rules of the two are different, and the results obtained by the mixed model are not as good as using the similarity only. In this paper, a new link prediction method SOTS (Similarities and Occurrences Time Series) based on the node similarity and the number of link times is proposed in this paper. First, the similarity scores between the nodes in each period of history are calculated by the random walk with a tendency, and then the time series model is used. By combining the actual number of links between nodes in each time period, the possibility of linking to each node in the next time period is predicted. Through two combination time series model, the relationship between the similarity scores and the number of links between nodes is studied. This method can be used for the future new links and duplication in the evolution network. The contributions of the present link are as follows: (1) a new method is used to combine attribute communities and network topology to calculate similarity scores. (2) the relationship between the formation of links and the similarity scores is studied. (3) the information of each time period is extracted by combining the similarity scores with the number of links, especially the two yuan time series. The combination of the model, effectively describes the co evolution of the two with time. Through the analysis of time series and static information, we combine the time and space two dimensions of the network structure. This method is more comprehensive than the traditional method of network information, and the model is more effective. After a detailed experiment, this paper is in the Chinese DBLP number. According to the set, the prediction results of the method and the predecessor method are evaluated. The experiment proves that the method improves the prediction accuracy of about 15%, and achieves the expected goal of this work.

【學(xué)位授予單位】：吉林大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2017
【分類號(hào)】：O211.61

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 張明席,胡成群;可進(jìn)行擇優(yōu)決策的時(shí)間序列預(yù)報(bào)方法[J];氣象;1989年06期

2 施久玉,杜金觀;有限個(gè)狀態(tài)時(shí)間序列的某些結(jié)果[J];應(yīng)用數(shù)學(xué)學(xué)報(bào);1990年01期

3 馮希杰;長(zhǎng)江三峽及其鄰區(qū)斷裂活動(dòng)時(shí)間序列[J];華南地震;1991年02期

4 王霞,郭嗣琮,劉淑娟;時(shí)間序列模糊滑動(dòng)預(yù)測(cè)[J];遼寧工程技術(shù)大學(xué)學(xué)報(bào)(自然科學(xué)版);1999年03期

5 溫品人;時(shí)間序列預(yù)測(cè)法的實(shí)際應(yīng)用分析[J];江蘇廣播電視大學(xué)學(xué)報(bào);2001年06期

6 許清海;混沌投資時(shí)間序列的嬗變[J];漳州師范學(xué)院學(xué)報(bào)(自然科學(xué)版);2003年01期

7 程毛林;時(shí)間序列系統(tǒng)建模預(yù)測(cè)的一種新方法[J];數(shù)學(xué)的實(shí)踐與認(rèn)識(shí);2004年08期

8 高潔;長(zhǎng)記憶時(shí)間序列適應(yīng)性預(yù)測(cè)的應(yīng)用[J];江南大學(xué)學(xué)報(bào);2004年05期

9 高潔;孫立新;;長(zhǎng)記憶時(shí)間序列的適應(yīng)性預(yù)測(cè)誤差的譜密度[J];統(tǒng)計(jì)與決策;2006年13期

10 楊鐘瑾;;淺談時(shí)間序列的分析預(yù)測(cè)[J];中國(guó)科技信息;2006年14期

相關(guān)會(huì)議論文前10條

1 周家斌;張海福;楊桂英;;多維多步時(shí)間序列預(yù)報(bào)方法及其應(yīng)用[A];中國(guó)現(xiàn)場(chǎng)統(tǒng)計(jì)研究會(huì)第九屆學(xué)術(shù)年會(huì)論文集[C];1999年

2 馬培蓓;紀(jì)軍;;基于時(shí)間序列的航空備件消耗預(yù)測(cè)[A];中國(guó)系統(tǒng)工程學(xué)會(huì)決策科學(xué)專業(yè)委員會(huì)第六屆學(xué)術(shù)年會(huì)論文集[C];2005年

3 盧世坤;李夕海;牛超;陳蛟;;時(shí)間序列的非線性非平穩(wěn)特性研究綜述[A];國(guó)家安全地球物理叢書（八）——遙感地球物理與國(guó)家安全[C];2012年

4 李強(qiáng);;基于線性模型方法對(duì)時(shí)間序列中異常值的檢測(cè)及證券實(shí)證分析[A];加入WTO和中國(guó)科技與可持續(xù)發(fā)展——挑戰(zhàn)與機(jī)遇、責(zé)任和對(duì)策（上冊(cè)）[C];2002年

5 戴麗金;何振峰;;基于云模型的時(shí)間序列相似性度量方法[A];第八屆中國(guó)不確定系統(tǒng)年會(huì)論文集[C];2010年

6 謝美萍;趙希人;莊秀龍;;多維非線性時(shí)間序列的投影尋蹤學(xué)習(xí)逼近[A];'99系統(tǒng)仿真技術(shù)及其應(yīng)用學(xué)術(shù)交流會(huì)論文集[C];1999年

7 張大斌;李紅燕;劉肖;張文生;;非線性時(shí)問序列的小波-模糊神經(jīng)網(wǎng)絡(luò)集成預(yù)測(cè)方法[A];第十五屆中國(guó)管理科學(xué)學(xué)術(shù)年會(huì)論文集（下）[C];2013年

8 黃云貴;;基于時(shí)間序列的電網(wǎng)固定資產(chǎn)投資規(guī)模研究[A];2012年云南電力技術(shù)論壇論文集（文摘部分）[C];2012年

9 李松臣;張世英;;時(shí)間序列高階矩持續(xù)和協(xié)同持續(xù)性研究[A];21世紀(jì)數(shù)量經(jīng)濟(jì)學(xué)（第8卷）[C];2007年

10 陳赫;羅聲求;;歷史橫斷面數(shù)據(jù)的時(shí)間序列化[A];科學(xué)決策與系統(tǒng)工程——中國(guó)系統(tǒng)工程學(xué)會(huì)第六次年會(huì)論文集[C];1990年

相關(guān)重要報(bào)紙文章前6條

1 ;《時(shí)間序列與金融數(shù)據(jù)分析》[N];中國(guó)信息報(bào);2004年

2 何德旭王朝陽(yáng);時(shí)間序列計(jì)量經(jīng)濟(jì)學(xué)：協(xié)整與有條件的異方差自回歸[N];中國(guó)社會(huì)科學(xué)院院報(bào);2003年

3 劉俏;讓數(shù)據(jù)坦白真相[N];21世紀(jì)經(jīng)濟(jì)報(bào)道;2003年

4 西南證券高級(jí)研究員董先安邋德圣基金研究中心郭奔宇;預(yù)計(jì)6月CPI同比上漲7.2%[N];證券時(shí)報(bào);2008年

5 東證期貨王愛華　楊衛(wèi)東;兩年漲跌輪回秋季普遍下跌[N];期貨日?qǐng)?bào);2009年

6 任勇邋鄭重;中國(guó)對(duì)世界鋼材價(jià)格的影響實(shí)證分析[N];現(xiàn)代物流報(bào);2007年

相關(guān)博士學(xué)位論文前10條

1 張墨謙;遙感時(shí)間序列數(shù)據(jù)的特征挖掘：在生態(tài)學(xué)中的應(yīng)用[D];復(fù)旦大學(xué);2014年

2 張德成;滑坡預(yù)測(cè)預(yù)報(bào)研究[D];昆明理工大學(xué);2015年

3 苗圣法;時(shí)間序列的模式檢測(cè)[D];蘭州大學(xué);2015年

4 翁同峰;時(shí)間序列與復(fù)雜網(wǎng)絡(luò)之間等價(jià)性問題及表征應(yīng)用研究[D];哈爾濱工業(yè)大學(xué);2015年

5 楊婷婷;用Argo浮標(biāo)結(jié)合衛(wèi)星觀測(cè)估算北太平洋經(jīng)向熱輸運(yùn)[D];中國(guó)科學(xué)院研究生院(海洋研究所);2015年

6 史文彬;時(shí)間序列的相關(guān)性及信息熵分析[D];北京交通大學(xué);2016年

7 原繼東;時(shí)間序列分類算法研究[D];北京交通大學(xué);2016年

8 盧偉;基于粒計(jì)算的時(shí)間序列分析與建模方法研究[D];大連理工大學(xué);2015年

9 胡建明;基于正則化核學(xué)習(xí)模型的時(shí)間序列多步預(yù)測(cè)的研究與應(yīng)用[D];蘭州大學(xué);2016年

10 黃標(biāo)兵;回聲狀態(tài)網(wǎng)絡(luò)時(shí)間序列預(yù)測(cè)方法及應(yīng)用研究[D];吉林大學(xué);2017年

相關(guān)碩士學(xué)位論文前10條

1 陳健;基于多變量相空間重構(gòu)的投資組合策略研究[D];華南理工大學(xué);2015年

2 蘭鑫;時(shí)間序列的復(fù)雜網(wǎng)絡(luò)轉(zhuǎn)換策略研究[D];西南大學(xué);2015年

3 米曉將;區(qū)域尺度下月均氣溫的時(shí)空演化格局研究[D];昆明理工大學(xué);2015年

4 張鳴敏;基于支持向量回歸的PM_(2.5)濃度預(yù)測(cè)研究[D];南京信息工程大學(xué);2015年

5 林健;基于改進(jìn)小世界回聲狀態(tài)網(wǎng)的時(shí)間序列預(yù)測(cè)[D];渤海大學(xué);2015年

6 曹智麗;日氣溫和干旱指數(shù)支持向量回歸預(yù)測(cè)方法[D];南京信息工程大學(xué);2015年

7 高雄飛;基于分形理論的土壤含水量時(shí)間序列特性分析[D];長(zhǎng)安大學(xué);2015年

8 姚茜;城市安全生產(chǎn)發(fā)展目標(biāo)研究[D];中國(guó)地質(zhì)大學(xué)(北京);2015年

9 謝翠穎;蘇州社會(huì)消費(fèi)品零售總額簡(jiǎn)析[D];蘇州大學(xué);2015年

10 包仁義;基于時(shí)間序列的搜索引擎評(píng)估模型算法研究[D];東北師范大學(xué);2015年

，

本文編號(hào)：1790311

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://lk138.cn/shoufeilunwen/benkebiyelunwen/1790311.html

上一篇：綿羊精子特異性蛋白TSSK6與IZUMO1相互作用的研究
下一篇：錫林郭勒地區(qū)積雪時(shí)空變化規(guī)律研究

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

国产伦乱,一曲二曲欧美日韩,AV在线不卡免费在线不卡免费,搞91AV视频

基于節(jié)點(diǎn)相似度和鏈接次數(shù)組合時(shí)序的鏈接預(yù)測(cè)算法