基于并行計算的交互式數(shù)據(jù)挖掘和可視化系統(tǒng)

發(fā)布時間：2018-09-07 12:34

【摘要】：隨著信息技術(shù)的進(jìn)步,數(shù)據(jù)量呈現(xiàn)爆炸式增長,傳統(tǒng)的基于CPU的數(shù)據(jù)挖掘技術(shù)已經(jīng)不能高效地處理如此巨大的數(shù)據(jù)量了。此外,人的大腦對于枯燥的數(shù)字更容易識別顏色和幾何圖形,利用數(shù)據(jù)可視化技術(shù)可以將數(shù)據(jù)挖掘結(jié)果更加自然和直觀地呈現(xiàn)在操作界面,可以更好地滿足用戶的需求。但目前,數(shù)據(jù)挖掘最常用的傳統(tǒng)數(shù)據(jù)可視化工具只能繪制二維或三維圖形,且缺乏互動性�；谏鲜鰡栴}本文提出了一個基于并行計算的交互式數(shù)據(jù)挖掘和可視化系統(tǒng)。本文提出了利用GPU(Graphics Processing Unit)編程的方式對經(jīng)典的數(shù)據(jù)流挖掘算法進(jìn)行優(yōu)化,傳統(tǒng)的基于CPU的數(shù)據(jù)挖掘技術(shù)采用串行的數(shù)據(jù)處理方式,無法滿足多個計算機(jī)資源同時運(yùn)行的需求,當(dāng)數(shù)據(jù)量較大時,處理時迭代次數(shù)會很多,內(nèi)存需求較大,處理速度會很慢,效率較低。而GPU編程方式采用的是并行的方式處理數(shù)據(jù),多個線程相互獨立同時運(yùn)行,運(yùn)算效率很高,更加適應(yīng)于處理大量數(shù)據(jù)。本文針對大數(shù)據(jù)中數(shù)據(jù)獨立性情況和數(shù)據(jù)依賴性情況,分別利用GPU編程技術(shù)對數(shù)據(jù)挖掘中聚類算法K-Means和連通區(qū)域標(biāo)記算法(Connected Component Labeling,CCL)進(jìn)行優(yōu)化,更好地完成了對大數(shù)據(jù)的挖掘分析。本文提出了交互式的數(shù)據(jù)可視化方法,為了實現(xiàn)對數(shù)據(jù)的可視化,我們利用DirectX的軟件開發(fā)工具包,將原始數(shù)據(jù)集或數(shù)據(jù)挖掘結(jié)果轉(zhuǎn)換為頂點、線、面、顏色和其他圖形等信息,利用軟件開發(fā)工具包中提供的各種清晰明了的圖形函數(shù)建立多維模型,并對最后的可視化結(jié)果進(jìn)行渲染。此外,我們還創(chuàng)建了一個圖形用戶界面(GUI),用戶可以根據(jù)自己不同的需求,改變聚類的參數(shù),得到符合自己需求的可視化結(jié)果�；谏鲜鏊惴�,本文對空調(diào)運(yùn)行產(chǎn)生的能耗數(shù)據(jù)進(jìn)行了實驗,通過使用GPU編程方式對傳統(tǒng)算法進(jìn)行優(yōu)化,不僅實現(xiàn)了對數(shù)據(jù)的聚類分析,而且通過實驗數(shù)據(jù)證明了使用本系統(tǒng)處理巨大的數(shù)據(jù)量時運(yùn)行速度得到很大提升,運(yùn)算效率更高。此外,我們使用DirectX的軟件開發(fā)工具包將抽象的數(shù)據(jù)挖掘結(jié)果表示為具體的四維立體的圖形圖像,并且用戶還可以通過鍵盤操作改變可視化結(jié)果的觀察視角以及聚類的K值,得到自己想要的結(jié)果,滿足了用戶的真正需求。
[Abstract]:With the development of information technology, the amount of data increases explosively. The traditional data mining technology based on CPU can not deal with such a huge amount of data efficiently. In addition, the human brain is easier to recognize color and geometry for boring numbers. Using data visualization technology, data mining results can be more naturally and intuitively presented in the operation interface, which can better meet the needs of users. But at present, the traditional data visualization tools used in data mining can only draw 2D or 3D graphics, and lack of interactivity. This paper presents an interactive data mining and visualization system based on parallel computing. In this paper, the classical data stream mining algorithm is optimized by using GPU (Graphics Processing Unit) programming method. The traditional data mining technology based on CPU adopts serial data processing method, which can not meet the needs of multiple computer resources running at the same time. When the amount of data is large, the number of iterations will be many, the memory requirement will be large, the processing speed will be very slow and the efficiency will be low. The GPU programming method uses the parallel way to process the data. The multiple threads run independently and simultaneously, so the operation efficiency is very high, so it is more suitable to deal with a large amount of data. Aiming at the data independence and data dependence in big data, this paper optimizes the clustering algorithm K-Means and the connected area marking algorithm (Connected Component Labeling,CCL by using GPU programming technology, and completes the mining analysis of big data. In this paper, an interactive method of data visualization is proposed. In order to realize the visualization of data, we use the software development kit of DirectX to transform the original data set or data mining result into vertex, line, surface, color and other graphics. The multi-dimensional model is built by using various clear graphic functions provided in the software development toolkit, and the final visualization results are rendered. In addition, we also create a graphical user interface (GUI),) which can change the clustering parameters according to their different requirements and get the visualization results that meet their needs. Based on the above algorithm, the energy consumption data generated by air conditioning operation are experimented in this paper, and the traditional algorithm is optimized by using GPU programming method, which not only realizes the clustering analysis of the data, The experimental data show that the speed of the system is greatly improved and the operation efficiency is higher when the system is used to deal with the huge amount of data. In addition, we use the software development kit of DirectX to represent the abstract data mining results as concrete four-dimensional three-dimensional graphics and images, and users can change the visual view of the visual results and the K value of clustering through keyboard operation. Get the results you want to meet the real needs of users.
【學(xué)位授予單位】：北方工業(yè)大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2017
【分類號】：TP311.13

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 權(quán)國龍;馮園園;馮仰存;顧小清;;面向知識的可視化技術(shù)分析與觀察[J];遠(yuǎn)程教育雜志;2016年01期

2 鄧仲華;劉偉偉;陸穎雋;;基于云計算的大數(shù)據(jù)挖掘內(nèi)涵及解決方案研究[J];情報理論與實踐;2015年07期

3 Yang Ju;Heping Xie;Zemin Zheng;Jinbo Lu;Lingtao Mao;Feng Gao;Ruidong Peng;;Visualization of the complex structure and stress field inside rock by means of 3D printing technology[J];Chinese Science Bulletin;2014年36期

4 Yufeng Zhao;Qi Xie;Liyun He;Baoyan Liu;Kun Li;Xiang Zhang;Wenjing Bai;Lin Luo;Xianghong Jing;Ruili Huo;;Comparsion analysis of data mining models applied to clinical research in Traditional Chinese Medicine[J];Journal of Traditional Chinese Medicine;2014年05期

5 潘巍;李戰(zhàn)懷;;大數(shù)據(jù)環(huán)境下并行計算模型的研究進(jìn)展[J];華東師范大學(xué)學(xué)報(自然科學(xué)版);2014年05期

6 Amani Tahat;Jordi Marti;Ali Khwaldeh;Kaher Tahat;;Pattern recognition and data mining software based on artificial neural networks applied to proton transfer in aqueous environments[J];Chinese Physics B;2014年04期

7 Chenyang Ge;Zuoxun Hou;Huimin Yao;Nanning Zheng;Wenzhe Zhao;;A new implementation of image-processing engine for 3D visualization and stereo video stream display[J];Chinese Science Bulletin;2014年Z1期

8 Zhen Chen;Fuye Han;Junwei Cao;Xin Jiang;Shuo Chen;;Cloud Computing-Based Forensic Analysis for Collaborative Network Security Management System[J];Tsinghua Science and Technology;2013年01期

9 孫大為;常桂然;高尚;靳立忠;王興偉;;Modeling a Dynamic Data Replication Strategy to Increase System Availability in Cloud Computing Environments[J];Journal of Computer Science & Technology;2012年02期

10 牛東曉;王永利;馬小勇;;Optimization of support vector machine power load forecasting model based on data mining and Lyapunov exponents[J];Journal of Central South University of Technology;2010年02期

相關(guān)博士學(xué)位論文前4條

1 李秋虹;基于MapReduce的大規(guī)模數(shù)據(jù)挖掘技術(shù)研究[D];復(fù)旦大學(xué);2013年

2 周勇;基于并行計算的數(shù)據(jù)流處理方法研究[D];大連理工大學(xué);2013年

3 張小慶;基于云計算環(huán)境的資源提供優(yōu)化方法研究[D];武漢理工大學(xué);2013年

4 任永功;面向聚類的數(shù)據(jù)可視化方法及相關(guān)技術(shù)研究[D];東北大學(xué);2006年

相關(guān)碩士學(xué)位論文前1條

1 王莉;基于Hadoop的大數(shù)據(jù)平臺數(shù)據(jù)挖掘云服務(wù)研究[D];長江大學(xué);2016年

，

本文編號：2228283

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://lk138.cn/shoufeilunwen/xixikjs/2228283.html

上一篇：復(fù)雜自然環(huán)境下車牌識別算法研究
下一篇：高校圖書推薦系統(tǒng)算法與模型的研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

国产伦乱,一曲二曲欧美日韩,AV在线不卡免费在线不卡免费,搞91AV视频

基于并行計算的交互式數(shù)據(jù)挖掘和可視化系統(tǒng)