信息檢索中支持隱式時(shí)間查詢的文檔排名方法

發(fā)布時(shí)間：2018-04-11 00:09

本文選題：時(shí)態(tài)信息檢索 + 查詢時(shí)間意圖��；參考：《江蘇大學(xué)》2017年碩士論文

【摘要】：互聯(lián)網(wǎng)的普及帶來了信息資源的爆炸式增長,為用戶提供更多選擇機(jī)會的同時(shí)也增加了尋找有效信息的難度,于是如何利用搜索引擎從海量的信息中篩選出滿足用戶需求的文檔成為了一個(gè)重要的挑戰(zhàn)。近年來,互聯(lián)網(wǎng)中包含時(shí)間信息的網(wǎng)頁與查詢數(shù)目不斷增多,時(shí)態(tài)信息檢索(Temporal Information retrieval,TIR)成為研究人員關(guān)注的熱點(diǎn)。它主要研究如何使用有效的技術(shù)提取網(wǎng)頁中的時(shí)態(tài)信息,分析查詢的時(shí)間意圖以及建立與時(shí)間有關(guān)的檢索排名模型等以改善搜索引擎的檢索質(zhì)量。信息檢索中具有時(shí)間意圖的查詢分為兩種,一種查詢中包含時(shí)間表達(dá)式,明確指定時(shí)間約束,稱為顯式時(shí)間查詢;而另一種查詢中沒有提供明確的時(shí)間標(biāo)準(zhǔn),但查詢的時(shí)間意圖在某個(gè)特定的時(shí)間區(qū)間,稱為隱式時(shí)間查詢。據(jù)統(tǒng)計(jì),互聯(lián)網(wǎng)中超過7%的查詢包含隱式時(shí)間意圖,大約1.5%的查詢包含明確的時(shí)間約束,可見隱式時(shí)間查詢在互聯(lián)網(wǎng)查詢中占據(jù)的比例更大,有更多的研究工作有待開展。本論文研究如何分析隱式時(shí)間查詢的時(shí)間意圖與優(yōu)化檢索性能,主要的工作內(nèi)容歸納如下:(1)對于隱式時(shí)間查詢,提出了一種結(jié)合語義網(wǎng)DBpedia和排名前k個(gè)文檔分析查詢時(shí)間意圖的方法。如果用戶查詢的內(nèi)容是關(guān)于著名人物或者歷史上某個(gè)重大事件,則查詢DBpedia(基于維基百科的語義網(wǎng))得到的具體的時(shí)間區(qū)間作為查詢的時(shí)間意圖;其他類型的查詢使用排名前k個(gè)文檔內(nèi)容中出現(xiàn)頻率較高的時(shí)間表達(dá)式分析查詢的時(shí)間意圖。(2)在語言模型的基礎(chǔ)上提出一種支持隱式時(shí)間查詢的文檔排名模型,考慮時(shí)間不確定性因素計(jì)算各個(gè)文檔產(chǎn)生查詢的概率作為文檔時(shí)間相關(guān)性得分,最后線性結(jié)合時(shí)間相關(guān)性得分和內(nèi)容相關(guān)性得分對文檔重新排序。(3)使用NTCIR-11會議Temporal Information Access(Temporalia)任務(wù)中的文檔集作為實(shí)驗(yàn)數(shù)據(jù),評價(jià)本文提出的分析隱式時(shí)間查詢意圖方法和文檔排名模型的性能。首先與已提出的幾種分析查詢時(shí)間意圖的方法比較,實(shí)驗(yàn)結(jié)果表明在計(jì)算文檔相關(guān)性得分前分析查詢的時(shí)間意圖具有一定的意義,本文提出的結(jié)合DBpedia和排名前k個(gè)文檔方法能夠較好地分析查詢時(shí)間意圖。在得到查詢時(shí)間意圖的基礎(chǔ)上,比較本文提出的方法與目前已存在的考慮時(shí)間因素排名方法的性能,結(jié)果顯示考慮時(shí)間因素的排名模型中大多數(shù)的指標(biāo)值都高于僅考慮內(nèi)容相關(guān)性的初始排名,說明在檢索模型中考慮時(shí)間相關(guān)性有利于改善檢索質(zhì)量。與其他的排名方法相比,本文提出的基于語言模型的排名方法性能較好。
[Abstract]:The popularity of the Internet has brought explosive growth of information resources, providing users with more choice opportunities and increasing the difficulty of finding effective information.Therefore, how to use search engines to select documents from massive information to meet the needs of users has become an important challenge.In recent years, the number of web pages and queries containing time information in the Internet has been increasing. Temporal Information retrieval (TIR) has become a hot topic for researchers.It mainly studies how to use effective techniques to extract temporal information from web pages, analyze the temporal intention of queries and establish time-related search ranking models to improve the search quality of search engines.There are two kinds of queries with time intention in information retrieval. One kind of query contains a time expression, which explicitly specifies time constraints, which is called explicit time query, and the other kind of query does not provide a clear time standard.But the time intention of the query is in a specific time interval, which is called implicit time query.According to statistics, more than 7% of the queries in the Internet contain implicit time intention, and about 1.5% of the queries contain explicit time constraints. It can be seen that implicit time queries occupy a larger proportion in Internet queries, and more research work needs to be carried out.In this paper, we study how to analyze the time intention of implicit time query and optimize its retrieval performance. The main work is summarized as follows: 1) for implicit time query,This paper presents a method of analyzing query time intention by combining semantic web DBpedia with top k documents.If the content of a user query is about a famous person or a major event in history, the specific time interval obtained by the query DBpedia (Wikipedia based semantic Web) is taken as the time intention of the query.Other types of queries analyze the time intention of the query using the high frequency time expression in the top k document contents.) based on the language model, a document ranking model supporting implicit time query is proposed.Considering the time uncertainty factor, the probability of each document producing query is calculated as the document time correlation score.Finally, a linear combination of time correlation score and content correlation score is used to resort the document using the document set in the NTCIR-11 meeting Temporal Information access temporary Task as experimental data.The performance of the implicit time query intention method and the document ranking model proposed in this paper is evaluated.The experimental results show that it is significant to analyze the time intention of the query before calculating the correlation score of the document.The proposed method combined DBpedia with the top k documents can well analyze the query time intention.On the basis of obtaining the time intention of the query, this paper compares the performance of the proposed method with the existing ranking method considering time factors.The results show that most of the index values in the ranking model taking into account time factors are higher than the initial ranking which only considers the content correlation, which indicates that considering time correlation in the retrieval model is beneficial to improve the retrieval quality.Compared with other ranking methods, the proposed ranking method based on language model has better performance.
【學(xué)位授予單位】：江蘇大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2017
【分類號】：TP391.3

【參考文獻(xiàn)】

相關(guān)期刊論文前2條

1 張曉娟;陸偉;周紅霞;;用戶查詢中潛在時(shí)間意圖分析及其檢索建模[J];現(xiàn)代圖書情報(bào)技術(shù);2011年11期

2 張宗仁;楊天奇;;基于自然語言理解的SPARQL本體查詢[J];計(jì)算機(jī)應(yīng)用;2010年12期

相關(guān)碩士學(xué)位論文前1條

1 熊燕龍;移動學(xué)習(xí)中課程本體的構(gòu)建與應(yīng)用研究[D];江西師范大學(xué);2015年

，

本文編號：1733544

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://lk138.cn/shoufeilunwen/xixikjs/1733544.html

上一篇：共振光隧穿效應(yīng)機(jī)理及應(yīng)用研究
下一篇：無線傳感網(wǎng)絡(luò)與移動機(jī)器人結(jié)合的智能家居系統(tǒng)

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

国产伦乱,一曲二曲欧美日韩,AV在线不卡免费在线不卡免费,搞91AV视频

信息檢索中支持隱式時(shí)間查詢的文檔排名方法