中国韩国日本在线观看免费,A级尤物一区,日韩精品一二三区无码,欧美日韩少妇色

當(dāng)前位置:主頁 > 科技論文 > 自動(dòng)化論文 >

基于代價(jià)敏感方法的垃圾網(wǎng)頁欺詐檢測

發(fā)布時(shí)間:2018-05-30 23:20

  本文選題:垃圾網(wǎng)頁檢測 + 代價(jià)敏感學(xué)習(xí) ; 參考:《西南交通大學(xué)》2017年碩士論文


【摘要】:隨著近20年互聯(lián)網(wǎng)技術(shù)的急速發(fā)展,各式各樣的網(wǎng)站和Web應(yīng)用層出不窮,這些網(wǎng)站的出現(xiàn)給人們的生活帶來了便利。與此同時(shí),作為互聯(lián)網(wǎng)發(fā)展的副產(chǎn)品,網(wǎng)上也存在大量含有詐騙或有害信息的垃圾網(wǎng)頁,這些被作弊者散布的垃圾網(wǎng)頁嚴(yán)重地危害著上網(wǎng)者的利益。如何準(zhǔn)確地識(shí)別和檢測這些垃圾網(wǎng)頁是當(dāng)前研究者所關(guān)注的熱點(diǎn)之一。本文首先從垃圾網(wǎng)頁二元分類檢測入手,研究當(dāng)垃圾網(wǎng)頁和正常網(wǎng)頁被錯(cuò)分后產(chǎn)生的不同代價(jià),采用了基于代價(jià)敏感支持向量機(jī)的檢測方法。在引入代價(jià)敏感方法后,針對(duì)很多方案中需要人為指定代價(jià)的問題,基于粒子群優(yōu)化算法構(gòu)建了融合代價(jià)計(jì)算的垃圾網(wǎng)頁檢測框架。具體做法是把代價(jià)敏感支持向量機(jī)包裝為粒子群算法的適應(yīng)函數(shù),其中代價(jià)敏感分類的代價(jià)參數(shù)作為粒子群算法的尋優(yōu)問題,分類算法的AUC值作為適應(yīng)函數(shù)的輸出。以此既保證了分類檢測的性能又降低了人為因素對(duì)算法的影響。其次,本文研究了多級(jí)垃圾網(wǎng)頁檢測問題,多級(jí)檢測相比二分檢測更加細(xì)粒度,要求垃圾網(wǎng)頁按不同危害度被檢出。本文基于代價(jià)敏感支持向量機(jī)的“一對(duì)一”組合多元分類方法實(shí)現(xiàn)了多級(jí)垃圾網(wǎng)頁檢測,“一對(duì)一”組合多分類方法既保證了檢測性能,又避免了代價(jià)矩陣中代價(jià)融合的問題。之后同樣結(jié)合粒子群優(yōu)化算法,對(duì)多個(gè)誤分類代價(jià)進(jìn)行計(jì)算。本文基于UK2007垃圾網(wǎng)頁數(shù)據(jù)集的原始類標(biāo)數(shù)據(jù),構(gòu)建了 MC-UK2007三類別的新數(shù)據(jù)集。之后分別使用UK2007和MC-UK2007進(jìn)行融合代價(jià)計(jì)算的二分類和多分類檢測實(shí)驗(yàn),并應(yīng)用其他算法設(shè)置了多組實(shí)驗(yàn)進(jìn)行對(duì)比。實(shí)驗(yàn)結(jié)果顯示本文所提的兩個(gè)方法均能取得更優(yōu)的AUC值,表明本文方法能夠更有效地檢出垃圾網(wǎng)頁。
[Abstract]:With the rapid development of Internet technology in recent 20 years, a variety of websites and Web applications emerge in endlessly. The emergence of these websites brings convenience to people's lives. At the same time, as a by-product of the development of the Internet, there are also a large number of spam pages containing fraud or harmful information on the Internet. These spam pages spread by cheaters seriously harm the interests of Internet users. How to accurately identify and detect these spam pages is one of the hot topics that researchers pay attention to. This paper starts with the binary classification detection of garbage pages, studies the different costs when garbage pages and normal pages are misclassified, and adopts a cost-sensitive support vector machine based detection method. After introducing the cost sensitive method, aiming at the problem of artificial specified cost in many schemes, a garbage page detection framework based on particle swarm optimization (PSO) algorithm is proposed. The specific method is to package the cost sensitive support vector machine as the adaptive function of the particle swarm optimization algorithm, in which the cost parameters of the cost sensitive classification are taken as the optimization problem of the particle swarm optimization algorithm, and the AUC value of the classification algorithm is taken as the output of the fitness function. This not only ensures the performance of classification and detection, but also reduces the influence of human factors on the algorithm. Secondly, this paper studies the problem of multilevel garbage page detection. Multilevel detection is more fine-grained than binary detection, which requires garbage pages to be detected according to different hazards. In this paper, the "one to one" multivariate classification method based on the cost sensitive support vector machine is used to realize multilevel spam page detection. The "one to one" combined multiple classification method not only guarantees the detection performance, but also avoids the problem of cost fusion in the cost matrix. After that, the cost of multiple misclassification is calculated with particle swarm optimization (PSO). Based on the original class mark data of UK2007 garbage page data set, this paper constructs a new data set of three categories of MC-UK2007. After that, UK2007 and MC-UK2007 are used to carry out two-classification and multi-classification detection experiments of fusion cost calculation, and other algorithms are used to set up multi-group experiments for comparison. The experimental results show that the two methods proposed in this paper can obtain better AUC value, which indicates that the proposed method can detect garbage pages more effectively.
【學(xué)位授予單位】:西南交通大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP393.092;TP18

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 劉汝雋;賈斌;辛陽;;基于信息增益特征選擇的網(wǎng)絡(luò)異常檢測模型[J];計(jì)算機(jī)應(yīng)用;2016年S2期

2 董亞楠;劉學(xué)軍;李斌;;一種基于用戶行為特征選擇的點(diǎn)擊欺詐檢測方法[J];計(jì)算機(jī)科學(xué);2016年10期

3 權(quán)鑫;顧韻華;鄭關(guān)勝;顧彬;;一種增量式的代價(jià)敏感支持向量機(jī)[J];中國科學(xué)技術(shù)大學(xué)學(xué)報(bào);2016年09期

4 盧曉勇;陳木生;;基于隨機(jī)森林和欠采樣集成的垃圾網(wǎng)頁檢測[J];計(jì)算機(jī)應(yīng)用;2016年03期

5 李法良;朱焱;曾俊東;;集成PCA降維與分類算法的垃圾網(wǎng)頁檢測[J];計(jì)算機(jī)應(yīng)用與軟件;2014年10期

6 呂超鎮(zhèn);姬東鴻;吳飛飛;;基于LDA特征擴(kuò)展的短文本分類[J];計(jì)算機(jī)工程與應(yīng)用;2015年04期

7 劉奇旭;張辣,

本文編號(hào):1957272


資料下載
論文發(fā)表

本文鏈接:http://www.lk138.cn/kejilunwen/zidonghuakongzhilunwen/1957272.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶68983***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請E-mail郵箱bigeng88@qq.com