當(dāng)前位置：主頁(yè) > 管理論文 > 移動(dòng)網(wǎng)絡(luò)論文 >

基于一類(lèi)SVM的網(wǎng)絡(luò)不良信息過(guò)濾算法研究

發(fā)布時(shí)間：2019-05-10 07:32

【摘要】：互聯(lián)網(wǎng)的高速發(fā)展使得通過(guò)網(wǎng)絡(luò)傳輸?shù)奈募O(jiān)控和過(guò)濾成為一個(gè)熱門(mén)課題。這些文件中可能包含了不良信息。網(wǎng)絡(luò)流量中的信息包含著各種網(wǎng)絡(luò)協(xié)議，可能被分片，編碼。機(jī)器無(wú)法直接識(shí)別其中的需要監(jiān)控的內(nèi)容。而對(duì)于內(nèi)容過(guò)濾，使用傳統(tǒng)的基于字符串匹配的算法顯然無(wú)法滿足呈幾何爆炸級(jí)別的信息增長(zhǎng)的監(jiān)管需求。雖然使用SVM確實(shí)可以提高分類(lèi)效率，但依然存在維數(shù)過(guò)大，導(dǎo)致存儲(chǔ)資源和計(jì)算能力浪費(fèi)的現(xiàn)象。本文首先分析如何在眾多網(wǎng)絡(luò)協(xié)議中，根據(jù)協(xié)議本身的特點(diǎn)和協(xié)議狀態(tài)機(jī)，對(duì)協(xié)議中包含的傳輸內(nèi)容進(jìn)行自動(dòng)識(shí)別匹配，然后對(duì)數(shù)據(jù)流部分進(jìn)行重組還原，并且進(jìn)行必要的解碼操作，以獲得需要過(guò)濾的文本信息。本文重點(diǎn)研究了主流的應(yīng)用層HTTP協(xié)議，F(xiàn)TP協(xié)議，SMTP協(xié)議和POP3協(xié)議，，以及主流的私有應(yīng)用飛信協(xié)議，QQ協(xié)議和MSN協(xié)議。然后本文提出了一種針對(duì)如何有效減少SVM的維數(shù)的改進(jìn)算法，提出通過(guò)使用三種特征簡(jiǎn)約對(duì)向量機(jī)的維數(shù)進(jìn)行約束。這種算法的改進(jìn)達(dá)到加快運(yùn)算速度，節(jié)省存儲(chǔ)空間、提高準(zhǔn)確率的作用。實(shí)驗(yàn)表明在選用相同數(shù)量的特征詞的前提下，基于文檔頻率，基于信息增益和開(kāi)方擬合算法取舍向量機(jī)的特征值各有優(yōu)缺點(diǎn)。在僅僅選取500個(gè)特征值的情況下，改進(jìn)算法使得不良信息分類(lèi)和過(guò)濾的正確率達(dá)到了80%以上。在選取超過(guò)1000個(gè)特征值的情況下，DF算法的正確率超過(guò)了90%。
[Abstract]:With the rapid development of the Internet, file monitoring and filtering through the network has become a hot topic. These files may contain bad information. The information in the network traffic contains a variety of network protocols, which may be sliced and encoded. The machine cannot directly identify what needs to be monitored. For content filtering, the traditional string matching algorithm can not meet the regulatory needs of geometric explosion level information growth. Although the use of SVM can improve the classification efficiency, there is still a phenomenon that the dimension is too large, which leads to the waste of storage resources and computing power. This paper first analyzes how to automatically identify and match the transmission content contained in the protocol according to the characteristics of the protocol itself and the protocol state machine in many network protocols, and then reorganize and restore the data flow part. And carry out the necessary decoding operation to obtain the text information that needs to be filtered. This paper focuses on the mainstream application layer HTTP protocol, FTP protocol, SMTP protocol and POP3 protocol, as well as the mainstream private applications such as Fetion protocol, QQ protocol and MSN protocol. Then this paper proposes an improved algorithm to reduce the dimension of SVM effectively, and proposes to use three kinds of feature reduction to constrain the dimension of vector machine. The improvement of this algorithm can accelerate the operation speed, save the storage space and improve the accuracy. Experiments show that on the premise of choosing the same number of feature words, based on document frequency, based on information gain and square fitting algorithm, the eigenvalues of vector machines have their own advantages and disadvantages. When only 500 eigenvalues are selected, the correct rate of classification and filtering of bad information is more than 80%. When more than 1000 eigenvalues are selected, the correct rate of DF algorithm is more than 90%.
【學(xué)位授予單位】：上海交通大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2014
【分類(lèi)號(hào)】：TP393.08

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 閉樂(lè)鵬;徐偉;宋瀚濤;;基于一類(lèi)SVM的貝葉斯分類(lèi)算法[J];北京理工大學(xué)學(xué)報(bào);2006年02期

2 詹毅;;樸素貝葉斯算法和SVM算法在Web文本分類(lèi)中的效率分析[J];成都大學(xué)學(xué)報(bào)(自然科學(xué)版);2013年01期

3 藍(lán)惠賓;;基于包過(guò)濾控制在局域網(wǎng)的應(yīng)用[J];電子制作;2013年09期

4 杜隆胤;;嵌入式調(diào)試中的數(shù)據(jù)獲取和錯(cuò)誤定位[J];計(jì)算機(jī)安全;2012年04期

5 唐續(xù);劉心松;楊峰;;Linux網(wǎng)絡(luò)協(xié)議棧分析及協(xié)議添加的實(shí)現(xiàn)[J];計(jì)算機(jī)科學(xué);2003年02期

6 楊凱峰;張毅坤;李燕;;基于文檔頻率的特征選擇方法[J];計(jì)算機(jī)工程;2010年17期

7 郭傳雄,鄭少仁;對(duì)Linux操作系統(tǒng)中TCP/IP網(wǎng)絡(luò)協(xié)議的IP層排隊(duì)分析[J];計(jì)算機(jī)學(xué)報(bào);2001年08期

8 申紅;呂寶糧;內(nèi)山將夫;井佐原均;;文本分類(lèi)的特征提取方法比較與改進(jìn)[J];計(jì)算機(jī)仿真;2006年03期

9 曹建芳;王鴻斌;;一種新的基于SVM-KNN的Web文本分類(lèi)算法[J];計(jì)算機(jī)與數(shù)字工程;2010年04期

10 馮長(zhǎng)遠(yuǎn),普杰信;Web文本特征選擇算法的研究[J];計(jì)算機(jī)應(yīng)用研究;2005年07期

本文編號(hào)：2473458

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://lk138.cn/guanlilunwen/ydhl/2473458.html

上一篇：基于可信QoS聚類(lèi)的遙感服務(wù)發(fā)現(xiàn)機(jī)制
下一篇：基于網(wǎng)頁(yè)采集的校園新聞移動(dòng)網(wǎng)站設(shè)計(jì)與實(shí)現(xiàn)

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

国产伦乱,一曲二曲欧美日韩,AV在线不卡免费在线不卡免费,搞91AV视频

基于一類(lèi)SVM的網(wǎng)絡(luò)不良信息過(guò)濾算法研究