基于股價(jià)的情感詞庫獲取

發(fā)布時(shí)間：2018-05-19 20:25

本文選題：主題模型 + 趨勢概率模型　；參考：《西南財(cái)經(jīng)大學(xué)》2014年碩士論文

【摘要】：隨著互聯(lián)網(wǎng)的發(fā)展,越來越多的網(wǎng)民習(xí)慣從互聯(lián)網(wǎng)獲取信息,越來越多的企業(yè)開始試圖從網(wǎng)絡(luò)中獲取經(jīng)驗(yàn)相關(guān)的信息�；ヂ�(lián)網(wǎng)已經(jīng)成為繼報(bào)紙,廣播,電視之后的“第四媒體”�；ヂ�(lián)網(wǎng)因其便捷性,成為人們獲取信息的首要來源。同時(shí),多種社交媒體的出現(xiàn),例如微博,朋友圈,facebook,twitter的出現(xiàn),使人們可以大量發(fā)表自己觀點(diǎn)。這些觀點(diǎn),對于企業(yè)有著重要的意義。這些觀點(diǎn)可以幫助企業(yè)知道用戶對其商品的觀點(diǎn),可以幫助企業(yè)知道其對手對自己商品的觀點(diǎn)。這些信息可以幫助電影院進(jìn)行電影票房預(yù)測。同時(shí),這些信息也可以幫助人們更好了解自己生活的輿論等。情感分析(sentiment analysis)就是用來完成以上任務(wù)的一種技術(shù)。情感分析主要是用來解決誰對什么東西的什么方面有什么觀點(diǎn)。涉及主體——人,客體——特征,觀點(diǎn)——情感詞等。情感分析)又被稱為觀點(diǎn)發(fā)現(xiàn)(opinion find)。是從大量文本中找到主觀信息。例如,某人關(guān)于某事物的評價(jià)。某人對于某個(gè)觀點(diǎn)的意見等。其中,情感詞庫建立是情感分析的重要組成部分。本文主要研究兩個(gè)問題：第一,情感詞庫是與特定領(lǐng)域相關(guān)的,不同領(lǐng)域的情感詞庫具有明顯不同。同一個(gè)詞匯,在不同情感詞庫中,可能有著不同的情感色彩。如何自動(dòng)化的建立一個(gè)金融情感詞庫呢?第二,情感詞庫的所有情感詞匯并不是都具有相同的情感色彩,如何對這些情感詞進(jìn)行排名呢? 本文將自然語言處理技術(shù)與金融相關(guān)技術(shù)結(jié)合,力圖解決以上問題。首先,本文研究了基礎(chǔ)了自然語言處理技術(shù)；然后建立了基于以上理論基礎(chǔ)的系統(tǒng)。最后通過實(shí)驗(yàn),研究不同參數(shù)對于情感詞庫研究的影響。論文主要包括五個(gè)章節(jié)的內(nèi)容：第一章,緒論。介紹了國外相關(guān)學(xué)者對于本課題的研究現(xiàn)狀。闡述了本文的研究方法和研究思路。第二章,相關(guān)知識(shí)。介紹了常用的自然語言處理技術(shù)。常用的文本分類技術(shù)以及其數(shù)學(xué)原理。第三章,系統(tǒng)實(shí)現(xiàn)。介紹了本系統(tǒng)的開發(fā)與實(shí)現(xiàn)。介紹了基于lucene的整體系統(tǒng)開發(fā),分詞,索引,以及文本自動(dòng)生成技術(shù)。第四章,算法與實(shí)驗(yàn),本部分闡述了基于PLSA的Trend-PLSA算法。詞算法將趨勢與PLSA進(jìn)行融合,將元數(shù)據(jù)與概率圖模型相結(jié)合,從而提高情感詞庫的正確率。最后,本部分闡述了不同實(shí)驗(yàn)參數(shù)對于情感詞庫建立的影響。第五章,總結(jié)和期望。首先總結(jié)了本文的主要工作,以及本文的主要貢獻(xiàn)。最后提出了未來研究的新方向和新思路。本文采用如下技術(shù)進(jìn)行研究：首先,本文采用了自然語言處理技術(shù)。自然語言處理技術(shù)是一門計(jì)算機(jī)與語言學(xué)相結(jié)合的交叉學(xué)科。自然語言處理技術(shù)致力于讓機(jī)器理解人類的語言,如TF-IDF求值,主題模型,文本向量化方法,索引建立等。其次,本文采用了定性與定量相結(jié)合的技術(shù)。本文所研究的對象是情感分析。情感詞歸類本身屬于一個(gè)定性的問題,將給定的詞匯歸屬到指定類中。對于給定的情感詞找到所屬的情感類型即可。同時(shí),本文也給每個(gè)情感詞一個(gè)定量的數(shù)值,對所有的情感詞進(jìn)行排序,這個(gè)值的絕對值越大表明情感詞的感情色彩越強(qiáng)。本文處理的股價(jià)信息是一個(gè)定量的數(shù)據(jù),通過相關(guān)算法,本文把定量的數(shù)據(jù)轉(zhuǎn)化為定性的信息,通過這樣定性的信息,進(jìn)行情感詞判斷�？傊�,通過定性與定量相結(jié)合的方法,提高了情感詞庫的正確性,也提高了情感詞庫的實(shí)用性。通過實(shí)現(xiàn),本文發(fā)現(xiàn),本文所提出的情感詞生成算法具有較強(qiáng)的實(shí)用性。相比其他的情感詞提取算法,本文提出的情感詞生成算法正確率較高。本文的創(chuàng)新之處,可以通過如下方面進(jìn)行闡述。本文的創(chuàng)新之處主要是算法和技術(shù)上的創(chuàng)新。首先,本文不需要預(yù)先選定種子詞匯,所謂的種子詞匯,就是預(yù)先選擇的詞匯。情感詞庫常規(guī)生成方法,要先選定若干的種子詞匯。如果沒有良好的種子詞匯,所有的情感詞庫只能是水中花,鏡中月。優(yōu)秀的種子詞匯,是高質(zhì)量情感詞庫生成的保證。好的情感詞庫使得情感詞庫具有較強(qiáng)的泛化能力。對于特定領(lǐng)域的情感詞庫建立,“種子”詞匯的選擇需要選擇者具有很好的專家素養(yǎng)。從經(jīng)濟(jì)角度分析,雇傭這些專家來進(jìn)行種子詞匯挑選的費(fèi)用也是相當(dāng)昂貴的。同時(shí),這些詞匯應(yīng)當(dāng)具有普遍性,有較強(qiáng)的情感詞性。但這兩者通常是互相矛盾的,這樣的任務(wù)對于專家而言也并不是一項(xiàng)輕易的工作而本文所提出的算法,是一種非監(jiān)督式學(xué)習(xí)的算法,這種算法不需要預(yù)先知道任何與情感有關(guān)的詞匯。即不需要知道種子詞匯。從而大大減少了情感詞庫建立的費(fèi)用,加速了情感詞庫生成的速度。其次,詞語的情感性是隨著時(shí)間變化而變化的,新的情感詞不斷涌現(xiàn)。舊的詞匯又會(huì)有新的情感詞性。現(xiàn)有的算法不具有這種隨時(shí)間變化而自動(dòng)變化的自適應(yīng)能力。本文所設(shè)計(jì)的系統(tǒng),可以不斷的從網(wǎng)上獲取股價(jià)數(shù)據(jù),自動(dòng)的將股價(jià)數(shù)據(jù)與文本進(jìn)行匹配,從而可以隨時(shí)間變化不斷生成新的情感詞。這樣生成的情感詞庫具有很強(qiáng)的時(shí)效性。然后,同一個(gè)詞匯在不同領(lǐng)域中具有不同的情感色彩。不同領(lǐng)域的情感詞有著不同的排名。本文通過排序算法,對所有的情感詞進(jìn)行了排序。最后,本文提出了基于隱含語義分析算法的趨勢-隱含語義分析算法。本文實(shí)驗(yàn)了簡單貝葉斯算法。對比了簡單貝葉斯算法和隱含語義分析算法的實(shí)驗(yàn)效果。實(shí)現(xiàn)結(jié)果顯示,本算法相比其他算法相比,能更好的利用股價(jià)信息,從而做出更準(zhǔn)確的情感詞歸類,構(gòu)建更為優(yōu)秀的情感詞庫。
[Abstract]:With the development of the Internet, more and more netizens get used to obtain information from the Internet. More and more enterprises have begun to try to obtain the information related to the Internet. The Internet has become the "fourth media" after the newspaper, radio and television. The Internet has become the primary source of information for people. The emergence of social media, such as micro-blog, circle of friends, Facebook, and twitter, makes it possible for a large number of people to publish their views. These ideas are important to the business. These ideas help companies to know their views on their goods and help the business know their opponents' views on their goods. To help the cinema to make a movie box office prediction. At the same time, the information can also help people to better understand the public opinion of their lives. Sentiment analysis is a technique used to accomplish the above tasks. People, objects, features, opinions, emotional words, emotional analysis, and emotional analysis are also known as opinion find. It is to find subjective information from a large number of texts. For example, a person's evaluation of something. Someone's opinion on a point of view. Among them, the establishment of an emotional lexicon is an important part of the emotional analysis.
This paper mainly studies two questions: first, the emotional lexicon is related to a particular field. The emotional lexicon in different fields is distinctly different. The same word, in the different emotional lexicon, may have different emotional colors. How to automate the establishment of a financial emotional word library? Second, all emotional words are not in the emotional lexicon. All have the same emotional color, how to rank these emotional words?
In this paper, Natural Language Processing technology and financial related technology are combined to solve the above problems. First, this paper studies the foundation of Natural Language Processing technology, and then establishes a system based on the above theoretical basis. Finally, through experiments, the influence of different parameters on the research of emotional lexicon is studied.
This paper mainly includes five chapters:
The first chapter, introduction, introduces the research status of foreign scholars on this topic, and expounds the research methods and research ideas of this paper.
The second chapter, related knowledge, introduces the commonly used Natural Language Processing technology, the commonly used text classification technology and its mathematical principle.
The third chapter, system implementation, introduces the development and implementation of the system. It introduces the overall system development, segmentation, indexing, and text automatic generation technology based on Lucene.
The fourth chapter, algorithm and experiment, this part expounds the Trend-PLSA algorithm based on PLSA. The word algorithm combines the trend with the PLSA, and combines the metadata with the probability map model, thus improving the correct rate of the emotional lexicon. Finally, this part expounds the influence of different experimental parameters on the establishment of emotional lexicon.
The fifth chapter summarizes and expects. First, it summarizes the main work of this paper and the main contributions of this paper. Finally, it puts forward new directions and new ideas for future research.
This paper studies the following techniques:
First of all, this article uses Natural Language Processing technology. Natural Language Processing technology is a cross subject that combines computer and linguistics. Natural Language Processing technology is committed to making machines understand human language, such as TF-IDF evaluation, theme model, text to quantization method, cable indexing and so on.
Secondly, this paper uses a combination of qualitative and quantitative techniques. The object of this paper is emotional analysis. The classification of emotional words itself belongs to a qualitative problem, which belongs to a given class. The emotional type of a given emotion word can be found. At the same time, this article also gives each emotional word a quantitative value. The greater the absolute value of the value, the greater the absolute value of the value indicates that the emotional color is stronger. The stock price information dealt with in this article is a quantitative data. Through the relevant algorithms, the quantitative data is converted into qualitative information and the qualitative information is used to judge the emotional words. In a word, the qualitative and quantitative phases are made. The combination method improves the correctness of emotional lexicon and improves the practicability of emotional lexicon.
Through the implementation, this paper finds that the algorithm proposed in this paper is more practical. Compared with other affective word extraction algorithms, the algorithm proposed in this paper has a higher accuracy.
The innovation of this paper can be explained through the following aspects. The innovation of this article is mainly the innovation of algorithm and technology.
First, this article does not need to choose seed words in advance. The so-called seed vocabulary is a preselected vocabulary. The common generation method of emotional lexicon is to select a number of seed words. If there is no good seed vocabulary, all the emotional lexicon can only be water flower, mirror moon. Excellent seed vocabulary, high quality emotional lexicon generation. Guarantee. Good emotional lexicon makes the emotional lexicon highly generalization. For the establishment of a particular domain of emotional lexicon, the choice of "seed" vocabulary needs a good expert attainment. From an economic perspective, the cost of hiring these experts for seed vocabulary selection is also quite expensive. Remittance should be universal and have strong emotional words. But the two are usually contradictory, and such a task is not an easy task for experts. The algorithm proposed in this paper is an unsupervised learning algorithm, which does not need to know any emotion related vocabulary in advance. That is, it is not necessary to know. Thus, the cost of establishing emotional lexicon is greatly reduced, and the speed of generating emotional lexicon is accelerated.
Secondly, the emotion of the words is changed with time, the new emotion words are constantly emerging. The old words will have new emotional words. The existing algorithms do not have the self-adaptive ability to change automatically with time. The system designed in this paper can continuously obtain stock data from the Internet and automatically make the stock price data. Matching with the text, it can generate new emotional words over time. This generated emotional lexicon has a strong timeliness.
Then, the same word has different emotional colors in different fields. The emotion words in different fields have different ranking. In this paper, all the emotional words are sorted by sorting algorithm.
Finally, this paper puts forward the trend implicit semantic analysis algorithm based on the implicit semantic analysis algorithm. In this paper, the simple Bias algorithm is experimented. The experimental results of the simple Bias algorithm and the implicit semantic analysis algorithm are compared. The results show that the algorithm can make better use of the stock price information compared with other algorithms and make the more accurate. Classify the emotional words and construct a better emotional lexicon.
【學(xué)位授予單位】：西南財(cái)經(jīng)大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2014
【分類號(hào)】：F832.51;TP391.1

【相似文獻(xiàn)】

相關(guān)會(huì)議論文前5條

1 陳奇哲;劉全升;姚天f ;;漢語意見型語句主題與情感關(guān)系抽取的研究[A];第五屆全國信息檢索學(xué)術(shù)會(huì)議論文集[C];2009年

2 孫慧;關(guān)毅;董喜雙;;中文情感詞傾向消歧[A];第六屆全國信息檢索學(xué)術(shù)會(huì)議論文集[C];2010年

3 段秀婷;何婷婷;宋樂;;基于PMI-IR算法的Blog情感分類研究[A];第五屆全國青年計(jì)算語言學(xué)研討會(huì)論文集[C];2010年

4 李先斌;袁平波;俞能海;;基于局部最優(yōu)的情感標(biāo)簽圖像自動(dòng)標(biāo)注算法[A];第六屆和諧人機(jī)環(huán)境聯(lián)合學(xué)術(shù)會(huì)議（HHME2010)、第19屆全國多媒體學(xué)術(shù)會(huì)議（NCMT2010）、第6屆全國人機(jī)交互學(xué)術(shù)會(huì)議（CHCI2010）、第5屆全國普適計(jì)算學(xué)術(shù)會(huì)議（PCC2010）論文集[C];2010年

5 王樅;涂序彥;劉嘉;;注意-情緒協(xié)調(diào)的個(gè)性化信息推薦模型[A];2006年首屆ICT大會(huì)信息、知識(shí)、智能及其轉(zhuǎn)換理論第一次高峰論壇會(huì)議論文集[C];2006年

相關(guān)博士學(xué)位論文前5條

1 董喜雙;基于免疫多詞主體自治學(xué)習(xí)的情感分析研究[D];哈爾濱工業(yè)大學(xué);2013年

2 寇廣增;基于意見挖掘通用框架的情感極性強(qiáng)度模糊性研究[D];武漢大學(xué);2010年

3 楊玉珍;基于Web評論信息的傾向性分析關(guān)鍵技術(shù)研究[D];山東師范大學(xué);2014年

4 施寒瀟;細(xì)粒度情感分析研究[D];蘇州大學(xué);2013年

5 李榮軍;中文商品評論傾向性分析研究[D];北京郵電大學(xué);2011年

相關(guān)碩士學(xué)位論文前10條

1 孫博;關(guān)于情感詞的意義用法[D];遼寧大學(xué);2012年

2 張玉杰;情感詞的傾向性研究[D];北京郵電大學(xué);2011年

3 周曉;基于互聯(lián)網(wǎng)的情感詞庫擴(kuò)展與優(yōu)化研究[D];東北大學(xué);2011年

4 趙文婧;產(chǎn)品描述詞及情感詞抽取模式的研究[D];北京郵電大學(xué);2010年

5 羅艷;基于情感詞的產(chǎn)品評論挖掘研究[D];華中科技大學(xué);2010年

6 喻琦;中文微博情感分析技術(shù)研究[D];浙江工商大學(xué);2013年

7 劉邵博;社會(huì)網(wǎng)絡(luò)新媒體的信息獲取與情感分類關(guān)鍵技術(shù)研究及實(shí)現(xiàn)[D];河北科技大學(xué);2013年

8 時(shí)迎超;面向網(wǎng)絡(luò)的用戶觀點(diǎn)評價(jià)報(bào)告的自動(dòng)生成研究[D];東北大學(xué);2011年

9 張小倩;情感極性轉(zhuǎn)移現(xiàn)象研究及應(yīng)用[D];蘇州大學(xué);2012年

10 唐都鈺;領(lǐng)域自適應(yīng)的中文情感分析詞典構(gòu)建研究[D];哈爾濱工業(yè)大學(xué);2012年

，

本文編號(hào)：1911529

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://lk138.cn/jingjilunwen/guojijinrong/1911529.html

上一篇：人民幣國際化的“貨幣互換式”特征與貨幣政策過度沖銷
下一篇：“京津冀”戰(zhàn)略性新興產(chǎn)業(yè)外商直接投資研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

国产伦乱,一曲二曲欧美日韩,AV在线不卡免费在线不卡免费,搞91AV视频

基于股價(jià)的情感詞庫獲取