小鼠胚胎干細(xì)胞高置信度lincRNAs的預(yù)測及其調(diào)控模式的研究
發(fā)布時(shí)間:2018-01-16 05:15
本文關(guān)鍵詞:小鼠胚胎干細(xì)胞高置信度lincRNAs的預(yù)測及其調(diào)控模式的研究 出處:《哈爾濱工業(yè)大學(xué)》2017年博士論文 論文類型:學(xué)位論文
更多相關(guān)文章: lincRNA ES細(xì)胞 RNA-Seq 互作網(wǎng)絡(luò) 組蛋白修飾
【摘要】:lincRNAs在新陳代謝、生長發(fā)育,以及疾病等方面發(fā)揮功能,并在各個(gè)層面調(diào)控基因表達(dá)。作為關(guān)鍵的調(diào)控因子,lincRNAs在小鼠ES細(xì)胞中發(fā)揮重要的調(diào)節(jié)作用。本課題將利用高通量數(shù)據(jù)RNA-Seq識(shí)別在小鼠ES細(xì)胞中表達(dá)的未經(jīng)注釋的高置信度lincRNAs轉(zhuǎn)錄本,完善lincRNAs的基因組注釋。并識(shí)別增強(qiáng)子相關(guān)lincRNAs與啟動(dòng)子相關(guān)lincRNAs的特征調(diào)控模式,以及elincRNAs與啟動(dòng)子互作的識(shí)別,研究lincRNAs對(duì)基因的表達(dá)調(diào)控作用。本論文整合多套小鼠ES細(xì)胞,以及早期胚胎、全胚胎等RNA-Seq數(shù)據(jù),識(shí)別了6 701個(gè)小鼠ES細(xì)胞表達(dá)的新lincRNAs。RNA-Seq讀段的覆蓋率和CAGE進(jìn)行轉(zhuǎn)錄本完整性評(píng)估的結(jié)果表明,基于RNA-Seq識(shí)別的新lincRNAs是5′端缺失的不完整的轉(zhuǎn)錄本。已知lincRNAs和蛋白質(zhì)編碼轉(zhuǎn)錄本的TSS區(qū)域的分析結(jié)果表明lincRNAs具有特異的基因組與表觀基因組特征。預(yù)測模型十倍交叉驗(yàn)證和獨(dú)立的檢驗(yàn)集進(jìn)行評(píng)估結(jié)果表明,整合基因組與表觀基因組特征的lincRNA轉(zhuǎn)錄本TSS區(qū)域預(yù)測模型效能最優(yōu)。在小鼠全基因組范圍內(nèi)進(jìn)行l(wèi)incRNA轉(zhuǎn)錄本TSS區(qū)域的預(yù)測,并修正了1 293個(gè)新lincRNAs的TSS區(qū)域。利用CAGE以及活性染色質(zhì)修飾對(duì)修正前后的lincRNA轉(zhuǎn)錄本TSS區(qū)域進(jìn)行評(píng)估,結(jié)果表明基于預(yù)測的TSS區(qū)域在小鼠ES細(xì)胞中獲得了相對(duì)完整的lincRNA轉(zhuǎn)錄本。對(duì)新lincRNAs進(jìn)行基因組的分布分析以及基因組與表觀基因組表征,新lincRNAs與已知lincRNAs特征相似,具有比蛋白質(zhì)編碼轉(zhuǎn)錄本相對(duì)少的外顯子個(gè)數(shù)、相對(duì)短的轉(zhuǎn)錄本長度,以及相對(duì)低的保守性等特征,并富集重復(fù)元件;并且lincRNAs的表觀遺傳修飾模型顯著地區(qū)別于蛋白質(zhì)編碼轉(zhuǎn)錄本。利用RT-PCR檢測新lincRNAs在不同細(xì)胞系和小鼠不同發(fā)育階段的不同組織的表達(dá)水平,結(jié)果表明新lincRNAs的組織/細(xì)胞特異性表達(dá)。進(jìn)一步利用RACE實(shí)驗(yàn)對(duì)TCONS_00041333轉(zhuǎn)錄本全長進(jìn)行鑒定,結(jié)果表明該lincRNAs包含兩個(gè)轉(zhuǎn)錄本,長度分別為656 bp和571 bp。核心啟動(dòng)子元件的結(jié)合區(qū)域的分析表明,在其TSS上游存在TATA-box、GC-box、CCAAT-box和Initiator的結(jié)合區(qū)域,并富集H3K4me1和H3K27ac組蛋白修飾。按照染色質(zhì)狀態(tài)可以將lincRNAs分為elincRNAs(enhancer associated lincRNAs)和plincRNAs(promoter associated lincRNAs);谛∈驟S細(xì)胞已知lincRNA轉(zhuǎn)錄本TSS區(qū)域的H3K4me1/H3K4me3富集比率識(shí)別了包含224個(gè)elincRNAs與112個(gè)plincRNAs的高置信度集合。整合基因組與表觀基因組特征,利用正則化的羅杰斯特回歸模型識(shí)別顯著調(diào)控elincRNAs與plincRNAs的特征,elincRNAs與TSS區(qū)域的DNA甲基化,以及基因體區(qū)域的DNA甲基化和H3K122ac的調(diào)控相關(guān);plincRNAs與TSS區(qū)域的H3K9ac,以及基因體區(qū)域的H3K36me3的調(diào)控相關(guān)。并且基于預(yù)測模型識(shí)別了3 729個(gè)elincRNAs和1 392個(gè)plincRNAs。對(duì)elincRNAs和plincRNAs進(jìn)行基因組與表觀基因組表征,elincRNAs具有比plincRNAs相對(duì)較少的外顯子個(gè)數(shù)、相對(duì)短的轉(zhuǎn)錄本長度、相對(duì)低的表達(dá)水平和序列保守性,以及差異的染色質(zhì)修飾模式等特征;诮M蛋白修飾模式和轉(zhuǎn)錄因子富集模式分析小鼠ES細(xì)胞elincRNAs與啟動(dòng)子互作的調(diào)控模式,結(jié)果表明,elincRNAs與啟動(dòng)子間的互作更傾向于受轉(zhuǎn)錄因子的調(diào)控。并通過小鼠ES細(xì)胞elincRNAs與啟動(dòng)子高置信度互作集合的評(píng)價(jià)表明,基于轉(zhuǎn)錄因子斯皮爾曼相關(guān)性識(shí)別的elincRNAs與啟動(dòng)子互作是最優(yōu)的預(yù)測集合。構(gòu)建基于elincRNAs與啟動(dòng)子互作高置信度集合的互作網(wǎng)絡(luò),以及基于轉(zhuǎn)錄因子相關(guān)性的互作子網(wǎng)絡(luò),網(wǎng)絡(luò)拓?fù)涮卣鞯姆治鼋Y(jié)果表明,子網(wǎng)絡(luò)的網(wǎng)絡(luò)特性與互作網(wǎng)絡(luò)相似,elincRNAs特異靶向一些啟動(dòng)子,而非廣泛地調(diào)控。對(duì)互作子網(wǎng)絡(luò)進(jìn)行模塊挖掘以及功能富集分析,一些模塊富集在RNA聚合酶Ⅱ結(jié)合的轉(zhuǎn)錄激活的轉(zhuǎn)錄因子的功能,并ES細(xì)胞和胚胎發(fā)育相關(guān)功能。因此,elincRNAs可能參與靶基因轉(zhuǎn)錄的激活作用。綜上所述,本研究識(shí)別一組小鼠ES細(xì)胞中表達(dá)的轉(zhuǎn)錄本邊界相對(duì)完整的lincRNAs集合,并基于機(jī)器學(xué)習(xí)模型識(shí)別elincRNAs與plincRNAs的調(diào)控特征,在小鼠ES細(xì)胞中識(shí)別elincRNAs與其靶向啟動(dòng)子的互作關(guān)系。本研究不僅發(fā)現(xiàn)并研究小鼠發(fā)育過程中重要的lincRNAs,對(duì)于系統(tǒng)研究早期胚胎發(fā)育lincRNAs對(duì)基因表達(dá)的調(diào)控也具有重要意義。
[Abstract]:LincRNAs on the growth and development, the function of The new supersedes the old., and disease, and expressed at various levels of regulatory genes. As a key regulatory factor, lincRNAs play an important role in mouse ES cells. This paper will use RNA-Seq to identify the high-throughput data expression in mouse ES cells without high confidence lincRNAs transcripts note, complete lincRNAs genome annotation and identification. LincRNAs promoter and enhancer related lincRNAs pattern recognition and regulation, elincRNAs and promoter interactions, the effect of lincRNAs on gene expression regulation. The integration of multiple sets of mouse ES cells, and early embryos, whole embryo RNA-Seq data, identify the new lincRNAs.RNA-Seq reading section 6701 expression of mouse ES cells and the coverage of CAGE transcript integrity assessment results show that the new identification based on RNA-Seq LincRNAs is the 5 'end of the lack of incomplete transcripts. The analysis results of known lincRNAs and protein encoding transcripts in TSS region show that lincRNAs have specific genome and epigenome characteristics. Prediction model of ten fold cross validation and independent test set of evaluation results show that the integration of genome and lincRNA genome transcription characteristics of the table view the TSS prediction model. The optimal efficiency of regional prediction of lincRNA transcripts in TSS region in the mouse genome range, and modified the TSS area 1293 new lincRNAs. Chromatin modification were evaluated before and after the repair of lincRNA transcription is TSS region using CAGE and activity, the results show that the TSS region prediction obtained lincRNA transcription the relative integrity in mouse ES cells. Based on genomic distribution analysis and genome and epigenome characterization of new lincRNAs, new lincRNAs Similar with known lincRNAs features, compared with protein encoding transcripts less exon number, the transcription of relatively short length, and relatively low conservation features, and accumulation of repetitive elements; and epigenetic modifications of lincRNAs model is significantly different from the quality of encoding transcription protein expression by RT-PCR detection. The new lincRNAs in different tissues in different cell lines and mouse at different developmental stages. The results showed that the expression of new lincRNAs cell / tissue specificity. Further experiments using RACE for full-length TCONS_00041333 transcripts were identified, the results show that the lincRNAs contains two transcripts in length respectively combining the analysis of the regional 656 BP and 571 bp. core promoter the element that TATA-box exists in its TSS region upstream GC-box, combined with CCAAT-box and Initiator, and the enrichment of H3K4me1 and H3K27ac in accordance with the staining of histone modification. State lincRNAs can be divided into elincRNAs (enhancer associated lincRNAs) and plincRNAs (promoter associated lincRNAs). Based on the known mouse ES cell lincRNA transcripts in TSS region of H3K4me1/H3K4me3 enrichment ratio identified high reliability set contains 224 elincRNAs and 112 plincRNAs. The integration of genome and epigenome characteristics, characteristics of the regularized Rodgers regression significantly regulation of elincRNAs and plincRNAs model identification, elincRNAs and TSS region of DNA methylation, and the regulation of DNA methylation and genomic regions related to H3K122ac plincRNAs and TSS H3K9ac; region, and the regulation of genomic region of H3K36me3. And the prediction model of the identification of 3729 elincRNAs and 1392 plincRNAs. of the genome for elincRNAs and plincRNAs and the epigenome characterization based on elincRNAs is less than plincRNAs The exon number, a relatively short length of the transcription, expression and sequence conservation is relatively low, and the differences in chromatin modification patterns and other features. The results show that histone modification patterns and transcription factor enrichment pattern analysis control mode, elincRNAs mouse ES cells and the promoter interactions based on interaction control tend to be regulated by the transcription factor elincRNAs and promoter. And through the elincRNAs mouse ES cells and the promoter of high confidence interactions set the evaluation showed that the elincRNAs promoter and the transcription factor Spielman correlation identification interaction is the optimal prediction based on set. To construct the elincRNAs promoter and the interaction of the interaction network reliability based on set, and based on the interaction of sub network of transcription factor correlation analysis results, network topological features show that the network characteristics of sub network similarity and interaction network, elincRNAs specific target To some promoter, rather than widely regulation. Module mining and enrichment analysis of interaction sub networks, some transcription factor module enrichment combined with RNA polymerase II transcription activation function, and ES cells and embryonic development related functions. Therefore, elincRNAs activation may be involved in the transcription of target genes. To sum up, the identification of transcriptional expression of mouse ES cells in a set of relatively complete set the boundary of lincRNAs, and based on regulation characteristics of machine learning model to identify elincRNAs and plincRNAs, in mouse ES cells and identification of elincRNAs targeting interaction promoter. This research not only found lincRNAs and important research in mice during development it is of important significance for regulating system of early embryo development of lincRNAs gene expression.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2017
【分類號(hào)】:R3416
,
本文編號(hào):1431690
本文鏈接:http://www.lk138.cn/shoufeilunwen/yxlbs/1431690.html
最近更新
教材專著