基于一類SVM的網(wǎng)絡(luò)不良信息過(guò)濾算法研究
[Abstract]:With the rapid development of the Internet, file monitoring and filtering through the network has become a hot topic. These files may contain bad information. The information in the network traffic contains a variety of network protocols, which may be sliced and encoded. The machine cannot directly identify what needs to be monitored. For content filtering, the traditional string matching algorithm can not meet the regulatory needs of geometric explosion level information growth. Although the use of SVM can improve the classification efficiency, there is still a phenomenon that the dimension is too large, which leads to the waste of storage resources and computing power. This paper first analyzes how to automatically identify and match the transmission content contained in the protocol according to the characteristics of the protocol itself and the protocol state machine in many network protocols, and then reorganize and restore the data flow part. And carry out the necessary decoding operation to obtain the text information that needs to be filtered. This paper focuses on the mainstream application layer HTTP protocol, FTP protocol, SMTP protocol and POP3 protocol, as well as the mainstream private applications such as Fetion protocol, QQ protocol and MSN protocol. Then this paper proposes an improved algorithm to reduce the dimension of SVM effectively, and proposes to use three kinds of feature reduction to constrain the dimension of vector machine. The improvement of this algorithm can accelerate the operation speed, save the storage space and improve the accuracy. Experiments show that on the premise of choosing the same number of feature words, based on document frequency, based on information gain and square fitting algorithm, the eigenvalues of vector machines have their own advantages and disadvantages. When only 500 eigenvalues are selected, the correct rate of classification and filtering of bad information is more than 80%. When more than 1000 eigenvalues are selected, the correct rate of DF algorithm is more than 90%.
【學(xué)位授予單位】:上海交通大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP393.08
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 閉樂(lè)鵬;徐偉;宋瀚濤;;基于一類SVM的貝葉斯分類算法[J];北京理工大學(xué)學(xué)報(bào);2006年02期
2 詹毅;;樸素貝葉斯算法和SVM算法在Web文本分類中的效率分析[J];成都大學(xué)學(xué)報(bào)(自然科學(xué)版);2013年01期
3 藍(lán)惠賓;;基于包過(guò)濾控制在局域網(wǎng)的應(yīng)用[J];電子制作;2013年09期
4 杜隆胤;;嵌入式調(diào)試中的數(shù)據(jù)獲取和錯(cuò)誤定位[J];計(jì)算機(jī)安全;2012年04期
5 唐續(xù);劉心松;楊峰;;Linux網(wǎng)絡(luò)協(xié)議棧分析及協(xié)議添加的實(shí)現(xiàn)[J];計(jì)算機(jī)科學(xué);2003年02期
6 楊凱峰;張毅坤;李燕;;基于文檔頻率的特征選擇方法[J];計(jì)算機(jī)工程;2010年17期
7 郭傳雄,鄭少仁;對(duì)Linux操作系統(tǒng)中TCP/IP網(wǎng)絡(luò)協(xié)議的IP層排隊(duì)分析[J];計(jì)算機(jī)學(xué)報(bào);2001年08期
8 申紅;呂寶糧;內(nèi)山將夫;井佐原均;;文本分類的特征提取方法比較與改進(jìn)[J];計(jì)算機(jī)仿真;2006年03期
9 曹建芳;王鴻斌;;一種新的基于SVM-KNN的Web文本分類算法[J];計(jì)算機(jī)與數(shù)字工程;2010年04期
10 馮長(zhǎng)遠(yuǎn),普杰信;Web文本特征選擇算法的研究[J];計(jì)算機(jī)應(yīng)用研究;2005年07期
本文編號(hào):2473458
本文鏈接:http://www.lk138.cn/guanlilunwen/ydhl/2473458.html