中国韩国日本在线观看免费,A级尤物一区,日韩精品一二三区无码,欧美日韩少妇色

當(dāng)前位置:主頁 > 科技論文 > 自動(dòng)化論文 >

MapReduce框架下模糊粗糙集屬性約簡算法研究

發(fā)布時(shí)間:2018-06-13 03:36

  本文選題:模糊粗糙集 + 屬性約簡。 參考:《西南交通大學(xué)》2017年碩士論文


【摘要】:近年來隨著互聯(lián)網(wǎng)的高速發(fā)展,需要處理的數(shù)據(jù)量急劇增加,從而如何從海量數(shù)據(jù)中獲取知識(shí)成為了人們關(guān)注的熱點(diǎn),知識(shí)發(fā)現(xiàn)成為了重要的研究課題。屬性約簡(特征選擇)是有效地獲取知識(shí)摒除干擾因素的重要方法之一。在一個(gè)數(shù)據(jù)集(知識(shí)庫)中,有著眾多不同的屬性,但并不是每個(gè)屬性都有著相同的重要性。有些屬性對(duì)于人們決策可能重要一些,有些屬性可能不那么重要,有些屬性還有可能是冗余的、不必要的。由于這些冗余信息的存在,使得人們在獲得知識(shí)時(shí)會(huì)花費(fèi)掉更多的時(shí)間和空間用于處理這些無關(guān)信息。屬性約簡的目的是從數(shù)據(jù)集中去除這些無關(guān)信息,解決數(shù)據(jù)處理中的過擬合、維數(shù)災(zāi)難等問題。屬性約簡是粗糙集理論的重要應(yīng)用之一,得到了學(xué)者們的廣泛關(guān)注和研究。但是經(jīng)典粗糙集模型無法直接對(duì)數(shù)值型數(shù)據(jù)進(jìn)行處理,需要預(yù)先對(duì)數(shù)值數(shù)據(jù)進(jìn)行離散化處理,從而可能造成信息損失,影響知識(shí)的獲取。在模糊粗糙集模型下,可以直接處理數(shù)值型數(shù)據(jù)。針對(duì)基于屬性依賴度的屬性約簡算法中存在的一些缺陷,本文將粒子群算法與模糊粗糙集相結(jié)合,并從大數(shù)據(jù)的角度出發(fā),利用MapReduce框架,進(jìn)行模糊粗糙集和穩(wěn)健模糊粗糙集并行屬性約簡的相關(guān)研究。本論文的主要研究工作如下:1.將高斯核模糊粗糙集與粒子群算法相結(jié)合,構(gòu)建了基于粒子群算法的高斯核模糊粗糙集屬性約簡算法。由于高斯核模糊粗糙集的特性,在基于屬性依賴度的啟發(fā)式屬性約簡算法中,可能無法獲取最佳屬性組合,甚至無法獲得約簡。因而本文通過將粒子群算法與之結(jié)合,克服了該種缺陷,并利用高斯核模糊粗糙集的特性,在不同的核參數(shù)選擇下,可得出不同的屬性約簡以滿足分類的要求。采用UCI公用數(shù)據(jù)集進(jìn)行實(shí)驗(yàn),實(shí)驗(yàn)結(jié)果表明了該算法具有良好的約簡性能。(第3章)2.基于高斯核模糊粗糙集模型,分析了并行計(jì)算模糊粗糙集近似集和屬性依賴度的原理,給出了基于MapReduce框架的高斯核模糊粗糙集下近似集和屬性依賴度并行計(jì)算算法,進(jìn)而給出了基于粒子群算法的高斯核模糊粗糙集屬性約簡并行計(jì)算算法。該算法利用MapReduce的特性,直接在Map過程中求得不同分片中對(duì)象在該分片中與不同決策類對(duì)象的最小距離,而不必對(duì)兩兩對(duì)象間的關(guān)系都進(jìn)行輸出,從而減少了 HDFS的訪問。使得在大數(shù)據(jù)上計(jì)算模糊粗糙集下近似集以及屬性依賴度可行。在UCI公用數(shù)據(jù)集和人工生成的數(shù)據(jù)集上進(jìn)行實(shí)驗(yàn),實(shí)驗(yàn)結(jié)果表明了在大數(shù)據(jù)環(huán)境下本算法具有良好的并行性能和約簡性能。(第4章)3.在穩(wěn)健模糊粗糙集模型上,利用MapReduce框架,實(shí)現(xiàn)了高斯核穩(wěn)健模糊粗糙集并行屬性約簡算法。在該算法中,首先計(jì)算數(shù)據(jù)分片中每一個(gè)對(duì)象與它的k個(gè)鄰近的不同決策類對(duì)象的距離,從而求取整個(gè)數(shù)據(jù)集下每一個(gè)對(duì)象的k個(gè)鄰近點(diǎn),再利用RNN算子求取對(duì)象的下近似,進(jìn)而計(jì)算所有候選約簡的屬性依賴度以獲取屬性約簡。以上策略使該算法儲(chǔ)存空間需求較少,且能減少因多次迭代Hadoop平臺(tái)中資源調(diào)度產(chǎn)生的時(shí)間開銷。在UCI公用數(shù)據(jù)集上對(duì)該算法進(jìn)行了實(shí)驗(yàn),分析了使用不同參數(shù)的RNN算子時(shí)的約簡性能和并行性能。實(shí)驗(yàn)結(jié)果表明該算法能夠?qū)Υ髷?shù)據(jù)進(jìn)行約簡,克服了傳統(tǒng)模型無法獲取約簡的情況。該算法不僅能夠有效地處理噪聲數(shù)據(jù),而且具有良好的并行性能。(第5章)
[Abstract]:In recent years, with the rapid development of the Internet, the amount of data needed to be processed has increased rapidly. Thus how to acquire knowledge from mass data has become a hot topic of attention. Knowledge discovery has become an important research topic. Attribute reduction (feature selection) is one of the most important methods to effectively obtain knowledge to remove interference factors. There are many different attributes in a set (a knowledge base), but not every attribute has the same importance. Some attributes may be important for people to make decisions, some attributes may be less important, some attributes may be redundant, unnecessary. Because of the existence of these redundant information, people will be able to acquire knowledge. It takes more time and space to deal with these unrelated information. The purpose of attribute reduction is to remove these unrelated information from the data set and solve the problems of overfitting and dimension disaster in the data processing. Attribute reduction is one of the important applications of the rough set theory, and the extensive attention and research of the scholars are obtained, but the classical roughness is rough. The set model can not deal with the numerical data directly. It is necessary to discretize the numerical data in advance, which may cause information loss and influence the acquisition of knowledge. In the fuzzy rough set model, the numerical data can be processed directly. Particle swarm optimization (PSO) and fuzzy rough set are combined, and the MapReduce framework is used to study the correlation of fuzzy rough set and robust fuzzy rough set parallel attribute reduction. The main research work of this thesis is as follows: 1. combining the Gauss kernel fuzzy rough set and the particle swarm optimization algorithm, the particle swarm optimization algorithm is constructed. The Gauss kernel fuzzy rough set attribute reduction algorithm. Because of the characteristics of the Gauss kernel fuzzy rough set, it may not be able to obtain the best attribute combination in the heuristic attribute reduction algorithm based on the attribute dependence degree, and can not even reduce the reduction. This paper, by combining the particle swarm optimization algorithm, overcomes the defect, and uses the Gauss kernel model. Under the selection of different kernel parameters, different attribute reduction can be obtained to meet the requirements of classification. Experiment with UCI public data sets is used. The experimental results show that the algorithm has good reduction performance. (third) 2. based on the Gauss kernel fuzzy rough set model, the approximate set of parallel computing fuzzy rough set is analyzed and the approximate set of fuzzy rough set is analyzed. The principle of attribute dependence is given, and the parallel computation algorithm of approximation set and attribute dependence degree under the Gauss kernel fuzzy rough set based on the MapReduce framework is given, and then the parallel computation algorithm of the attribute reduction of the Gauss kernel fuzzy rough set based on particle swarm optimization is given. The algorithm uses the characteristics of the MapReduce to obtain the different slices directly in the Map process. The minimum distance between the object in the segment and the different decision class objects does not need to output the relationship between the 22 objects, thus reducing the access of the HDFS. It makes it feasible to calculate the approximate set and the attribute dependence of the fuzzy rough set on the large data. Experiments are carried out on the UCI public data set and the artificially generated data set. The results show that the algorithm has good parallel performance and reduction performance in the large data environment. (fourth) 3. on the robust fuzzy rough set model, the MapReduce kernel robust fuzzy rough set parallel attribute reduction algorithm is realized on the robust fuzzy rough set model. In this algorithm, the algorithm calculates each object and its K adjacent to each object in the data slice first. The distance of the decision class object is different, thus the K adjacent points of each object under the whole data set are obtained, and then the lower approximation of the object is obtained by the RNN operator, and then the attribute dependence of all the candidate reductions is calculated to obtain the attribute reduction. The strategy makes the algorithm store less space requirements and can reduce the number of iterations in the Hadoop platform. The time overhead of resource scheduling. The algorithm is experimentation on the UCI public data set. The reduction performance and parallel performance of the RNN operator with different parameters are analyzed. The experimental results show that the algorithm can reduce the large data and overcome the fact that the traditional model can not reduce the reduction. The algorithm can not only be effectively applied to the algorithm. Noise data and good parallel performance. (Chapter fifth)
【學(xué)位授予單位】:西南交通大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP18

【參考文獻(xiàn)】

相關(guān)期刊論文 前9條

1 曾凱;佘X;;基于多核;哪:植谟(jì)算模型[J];電子科技大學(xué)學(xué)報(bào);2014年05期

2 張鈞波;李天瑞;潘毅;羅川;滕飛;;云平臺(tái)下基于粗糙集的并行增量知識(shí)更新算法[J];軟件學(xué)報(bào);2015年05期

3 錢進(jìn);苗奪謙;張澤華;;云計(jì)算環(huán)境下知識(shí)約簡算法[J];計(jì)算機(jī)學(xué)報(bào);2011年12期

4 樊雷;雷英杰;;基于直覺模糊粗糙集的屬性約簡研究[J];計(jì)算機(jī)工程與科學(xué);2008年07期

5 王霞;張文修;;概念格的屬性約簡與屬性特征[J];計(jì)算機(jī)工程與應(yīng)用;2008年12期

6 魏玲;祁建軍;張文修;;決策形式背景的概念格屬性約簡[J];中國科學(xué)(E輯:信息科學(xué));2008年02期

7 張文修,魏玲,祁建軍;概念格的屬性約簡理論與方法[J];中國科學(xué)E輯:信息科學(xué);2005年06期

8 劉少輝,盛秋戩,吳斌,史忠植,胡斐;Rough集高效算法的研究[J];計(jì)算機(jī)學(xué)報(bào);2003年05期

9 苗奪謙,胡桂榮;知識(shí)約簡的一種啟發(fā)式算法[J];計(jì)算機(jī)研究與發(fā)展;1999年06期



本文編號(hào):2012538

資料下載
論文發(fā)表

本文鏈接:http://www.lk138.cn/kejilunwen/zidonghuakongzhilunwen/2012538.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶eae55***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請E-mail郵箱bigeng88@qq.com