隨機森林及數(shù)據(jù)可視化在棉蚜等級預(yù)測中的應(yīng)用研究
本文選題:數(shù)據(jù)分析 切入點:隨機森林 出處:《山東農(nóng)業(yè)大學(xué)》2017年碩士論文
【摘要】:棉蚜的監(jiān)測預(yù)警是對棉蚜提前防治的研究重點,采集棉蚜發(fā)生相關(guān)的數(shù)據(jù)進(jìn)行分析預(yù)測,提前對棉蚜進(jìn)行防治,減少棉蚜給棉花帶來的危害,實現(xiàn)棉區(qū)的高產(chǎn)和優(yōu)產(chǎn)。進(jìn)行數(shù)據(jù)分析的研究過程從兩方面展開:一是利用高性能的機器算法;二是從數(shù)據(jù)可視化的角度對數(shù)據(jù)進(jìn)行展示分析。本文首先利用隨機森林算法進(jìn)行了棉蚜的數(shù)據(jù)分析。隨機森林是由多棵決策樹構(gòu)成的集成分類機器學(xué)習(xí)算法,多用來進(jìn)行數(shù)據(jù)的分類預(yù)測。決策樹和多元線性回歸算法也同隨機森林一樣常用來做數(shù)據(jù)的預(yù)測。但是算法的不同,可能導(dǎo)致在同一數(shù)據(jù)集上的預(yù)測率不一致,所以本文對三種算法在UCI數(shù)據(jù)集和粘蟲數(shù)據(jù)集上進(jìn)行了準(zhǔn)確率對比的實驗。目前進(jìn)行棉蚜蟲害等級預(yù)測多用的線性回歸模型,線性回歸模型的缺點是采用何種因子進(jìn)行表達(dá)只是一種猜測,以至于影響了因子的多樣性和不可測性。隨機森林模型的構(gòu)建不會因為影響因子的表達(dá)有所影響,況且隨機森林算法不會產(chǎn)生過擬合,處理大樣本集時速度快,對于多元共線性不敏感,分類預(yù)測的準(zhǔn)確率較高。本文的對比實驗中表明了隨機森林在數(shù)據(jù)預(yù)測中準(zhǔn)確率高,后期的實驗采用隨機森算法在棉蚜等級預(yù)測中進(jìn)行應(yīng)用。棉花是我國重要的經(jīng)濟作物,在農(nóng)業(yè)經(jīng)濟格局中作用巨大。而棉蚜是造成棉花減產(chǎn)和影響優(yōu)產(chǎn)的主要因素,因此棉蚜的提前防治非常重要。本文在對采集到的數(shù)據(jù)進(jìn)行數(shù)據(jù)的不平衡性處理和影響因子的篩選之后,構(gòu)建基于氣象因子數(shù)據(jù)和棉蚜天敵數(shù)據(jù)的隨機森林模型,并利用構(gòu)建好的模型對棉蚜蟲害發(fā)生的等級進(jìn)行預(yù)測。本實驗表明隨機森林模型的泛化誤差較小,在棉蚜蟲害等級預(yù)測中的準(zhǔn)確率比較高。其次利用數(shù)據(jù)可視化技術(shù)進(jìn)行數(shù)據(jù)分析。數(shù)據(jù)可視化技術(shù)作為數(shù)據(jù)分析的重要手段,用于棉蚜數(shù)據(jù)、氣象數(shù)據(jù)的分析中為棉蚜的防治提供參考。多維數(shù)據(jù)可視化作為數(shù)據(jù)可視化的研究重點之一,通過對多維數(shù)據(jù)進(jìn)行展示,發(fā)現(xiàn)屬性之間聯(lián)系。目前我們采集的數(shù)據(jù)為多維數(shù)據(jù),將采集到的氣象數(shù)據(jù)和棉蚜數(shù)據(jù)進(jìn)行可視化展示,發(fā)現(xiàn)數(shù)據(jù)隱藏的規(guī)律信息,有助于更好的進(jìn)行數(shù)據(jù)分析與決策。本論文中數(shù)據(jù)的展示與分析使得對棉蚜的大發(fā)生時間有了了解,為我們在合適的時間進(jìn)行防治提供參考,實驗中數(shù)據(jù)的可視化為模型的構(gòu)建和實驗結(jié)果的展示與分析起到了重要作用。
[Abstract]:Monitoring and early warning of cotton aphids is the focus of the study on the early control of cotton aphids. The data related to the occurrence of cotton aphids are collected to analyze and predict the occurrence of cotton aphids, to control the cotton aphids in advance, to reduce the harm of cotton aphid to cotton, and to realize the high yield and high yield of cotton aphids.The research process of data analysis is carried out from two aspects: one is to use high performance machine algorithm, the other is to display and analyze the data from the point of view of data visualization.In this paper, the random forest algorithm was used to analyze the data of cotton aphid.Stochastic forest is an integrated classification machine learning algorithm composed of multiple decision trees, which is often used for data classification and prediction.Decision trees and multivariate linear regression algorithms are also used to predict data as well as random forests.However, different algorithms may lead to inconsistent prediction rates on the same dataset. Therefore, the accuracy of the three algorithms on the UCI data set and the armyworm dataset is compared.At present, the linear regression model is used to predict the pest grade of cotton aphid. The disadvantage of the linear regression model is that the expression of the factors is only a guess, so that the diversity and unpredictability of the factors are affected.The construction of the stochastic forest model will not be affected by the expression of the influencing factors. Moreover, the stochastic forest algorithm will not produce over-fitting, and it can deal with large sample sets quickly, and it is insensitive to multivariate collinearity, and the accuracy of classification and prediction is high.The comparative experiment in this paper shows that the accuracy of random forest in data prediction is high. In the later experiment, the random forest algorithm is applied to the prediction of cotton aphid grade.Cotton is an important cash crop in China, which plays an important role in agricultural economic pattern.The cotton aphid is the main factor to reduce the yield of cotton and affect the yield of cotton, so it is very important to control the aphid in advance.In this paper, a random forest model based on meteorological factor data and natural enemy data of cotton aphid was constructed after the data imbalance processing and the screening of influence factors were carried out on the collected data.The class of cotton aphid pests was predicted by using the established model.The results showed that the generalization error of stochastic forest model was small, and the accuracy of prediction of cotton aphid pest grade was higher than that of random forest model.Secondly, data visualization technology is used for data analysis.As an important means of data analysis, data visualization technology is used in the data of cotton aphids. The analysis of meteorological data provides a reference for the control of cotton aphids.As one of the key points of data visualization, multidimensional data visualization can discover the relationship between attributes by displaying multidimensional data.At present, the data we collect are multidimensional data. The meteorological data and the data of cotton aphid are displayed visually, and the regular information of data hiding is found, which is helpful for better data analysis and decision making.The display and analysis of the data in this paper make us understand the occurrence time of cotton aphid, and provide a reference for us to control the aphid at the right time.Visualization of experimental data plays an important role in modeling and demonstration and analysis of experimental results.
【學(xué)位授予單位】:山東農(nóng)業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP18;S435.622.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 許世衛(wèi);王東杰;李燈華;高利偉;;我國“互聯(lián)網(wǎng)+”現(xiàn)代農(nóng)業(yè)進(jìn)展與展望[J];農(nóng)業(yè)網(wǎng)絡(luò)信息;2017年01期
2 霍宏;;計算機技術(shù)在現(xiàn)代農(nóng)業(yè)中的應(yīng)用[J];電子技術(shù)與軟件工程;2016年02期
3 李詒靖;郭海湘;李亞楠;劉曉;;一種基于Boosting的集成學(xué)習(xí)算法在不均衡數(shù)據(jù)中的分類[J];系統(tǒng)工程理論與實踐;2016年01期
4 戚森昱;杜京霖;錢沈申;殷復(fù)蓮;;多維數(shù)據(jù)可視化技術(shù)研究綜述[J];軟件導(dǎo)刊;2015年07期
5 苗煜飛;張霄宏;;決策樹C4.5算法的優(yōu)化與應(yīng)用[J];計算機工程與應(yīng)用;2015年13期
6 靳然;李生才;;基于小波神經(jīng)網(wǎng)絡(luò)的麥蚜發(fā)生量預(yù)測研究[J];天津農(nóng)業(yè)科學(xué);2015年04期
7 任磊;杜一;馬帥;張小龍;戴國忠;;大數(shù)據(jù)可視分析綜述[J];軟件學(xué)報;2014年09期
8 劉敏;郎榮玲;曹永斌;;隨機森林中樹的數(shù)量[J];計算機工程與應(yīng)用;2015年05期
9 溫廷新;張波;邵良杉;;煤與瓦斯突出預(yù)測的隨機森林模型[J];計算機工程與應(yīng)用;2014年10期
10 楊彥波;劉濱;祁明月;;信息可視化研究綜述[J];河北科技大學(xué)學(xué)報;2014年01期
相關(guān)會議論文 前1條
1 姚麗花;;氣象要素與棉蚜種群變化的成因分析[A];中國氣象學(xué)會2007年年會生態(tài)氣象業(yè)務(wù)建設(shè)與農(nóng)業(yè)氣象災(zāi)害預(yù)警分會場論文集[C];2007年
相關(guān)碩士學(xué)位論文 前2條
1 王瑞松;大數(shù)據(jù)環(huán)境下時空多維數(shù)據(jù)可視化研究[D];浙江大學(xué);2016年
2 隆軻;BP神經(jīng)網(wǎng)絡(luò)在蟲害預(yù)測上的應(yīng)用研究[D];湖南農(nóng)業(yè)大學(xué);2014年
,本文編號:1693875
本文鏈接:http://www.lk138.cn/shoufeilunwen/xixikjs/1693875.html