兩類自適應(yīng)稀疏學習機及其在高維數(shù)據(jù)挖掘中的應(yīng)用
發(fā)布時間:2018-06-02 03:18
本文選題:高維數(shù)據(jù)挖掘 + 群lasso; 參考:《河南師范大學》2017年碩士論文
【摘要】:隨著現(xiàn)代高維數(shù)據(jù)的不斷積累,以支持向量機為代表的傳統(tǒng)統(tǒng)計學習方法不能很好地進行高維變量選擇.發(fā)展新型的自適應(yīng)稀疏學習機為進行高維數(shù)據(jù)挖掘提供了新的思路.為此本文有機結(jié)合統(tǒng)計學,系統(tǒng)生物學和信息論方法,發(fā)展了兩種具有生物可解釋性的自適應(yīng)稀疏學習模型和求解算法,并將其分別應(yīng)用到高維數(shù)據(jù)分析中,都獲得了較好的分類和基因選擇性能.本文的主要創(chuàng)新如下:(1)針對群lasso懲罰類方法處理二分類高維數(shù)據(jù)面臨的提前變量分群,自適應(yīng)的群內(nèi)變量選擇,生物可解釋性等難題,我們致力于開展基于網(wǎng)絡(luò)分析的變量分群策略和新型自適應(yīng)懲罰機制研究,據(jù)此提出了融合網(wǎng)絡(luò)分析和信息學理論方法的自適應(yīng)稀疏群lasso.首先,將網(wǎng)絡(luò)分析中的網(wǎng)絡(luò)模塊識別與群lasso中的變量分群有機聯(lián)系起來,利用加權(quán)基因共表達網(wǎng)絡(luò)分析方法辨識出具有良好生物交互關(guān)系的模塊.其次,利用條件交互信息等信息論方法在每一個被劃分的群內(nèi)構(gòu)建變量重要性的評價準則,據(jù)此構(gòu)造具有生物可解釋性的權(quán)重系數(shù)并將其添加到懲罰項的合適位置來自適應(yīng)地進行變量選擇.最后,在四種高維癌癥生物數(shù)據(jù)上的結(jié)果驗證了所提的自適應(yīng)稀疏學習機能夠有效地進行分類和群體基因選擇.(2)針對群懲罰多項式回歸處理多類分類高維數(shù)據(jù)中出現(xiàn)的自適應(yīng)變量選擇,生物可解釋性等難題,我們提出了融合網(wǎng)絡(luò)分析方法的稀疏多項式回歸.通過結(jié)合生物學資源和基因表達譜信息,我們利用GeneRank構(gòu)建了具有生物學意義的權(quán)重并引入到群lasso懲罰中,提出了一種新的自適應(yīng)稀疏學習機.最終在酵母二次轉(zhuǎn)化數(shù)據(jù)上的實驗結(jié)果驗證了所提的模型與其它模型相比取得了較好的分類和基因選擇性能.
[Abstract]:With the accumulation of modern high-dimensional data, the traditional statistical learning method, represented by support vector machine, can not select high-dimensional variables well. The development of a new adaptive sparse learning machine provides a new idea for high dimensional data mining. In this paper, two biologically interpretable adaptive sparse learning models and solving algorithms are developed, which are combined with statistics, system biology and information theory, and are applied to high-dimensional data analysis. Good classification and gene selection performance were obtained. The main innovations of this paper are as follows: (1) to deal with the problems of early variable clustering, adaptive intra-group variable selection, biological interpretability and so on, which are faced with two-class high-dimensional data by lasso penalty class method. We focus on variable clustering strategy and new adaptive punishment mechanism based on network analysis. Based on this, we propose an adaptive sparse group lasso-based network analysis and informatics theory. Firstly, the identification of network modules in network analysis is associated with variable clustering in group lasso, and the modules with good biological interaction are identified by using weighted gene coexpression network analysis method. Secondly, using the information theory method such as conditional interactive information, we construct the evaluation criterion of the importance of variables in each divided group. Based on this, a biologically interpretable weight coefficient is constructed and added to the appropriate position of the penalty term to adaptively select variables. Finally, The results on four kinds of high-dimensional cancer biological data show that the proposed adaptive sparse learning machine can effectively classify and select population genes. Adaptive variable selection, In this paper, we propose a sparse polynomial regression method for fusion network analysis. By combining the information of biological resources and gene expression profiles, we use GeneRank to construct biologically significant weights and introduce them into group lasso punishment, and propose a new adaptive sparse learning machine. Finally, the experimental results on yeast secondary transformation data show that the proposed model has better classification and gene selection performance than other models.
【學位授予單位】:河南師范大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP18;TP311.13
【參考文獻】
相關(guān)期刊論文 前3條
1 王小玉;李鈞濤;陳留院;;稀疏對數(shù)回歸及其在基因選擇中的應(yīng)用[J];河南師范大學學報(自然科學版);2012年05期
2 李鈞濤;楊瑞峰;左紅亮;;統(tǒng)計機器學習研究[J];河南師范大學學報(自然科學版);2010年06期
3 李鈞濤;賈英民;;用于癌癥分類與基因選擇的一種改進的彈性網(wǎng)絡(luò)(英文)[J];自動化學報;2010年07期
,本文編號:1967120
本文鏈接:http://www.lk138.cn/kejilunwen/zidonghuakongzhilunwen/1967120.html
最近更新
教材專著