Web用戶行為數(shù)據(jù)收集統(tǒng)計(jì)系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)
本文關(guān)鍵詞: 網(wǎng)站流量分析 行為數(shù)據(jù)收集 JavaScript自動(dòng)嵌入 Netty 出處:《北京交通大學(xué)》2015年碩士論文 論文類型:學(xué)位論文
【摘要】:互聯(lián)網(wǎng)時(shí)代的到來(lái),網(wǎng)絡(luò)已經(jīng)融入人們的生活,人們也逐漸接受了網(wǎng)上購(gòu)物的消費(fèi)模式。網(wǎng)購(gòu)者的急劇增加,讓各個(gè)電子商務(wù)網(wǎng)站投入更多的成本來(lái)吸引用戶創(chuàng)造更多的營(yíng)收。既然是電子商務(wù)網(wǎng)站,那么良好的網(wǎng)站設(shè)計(jì),讓用戶滿意的購(gòu)物體驗(yàn)對(duì)網(wǎng)站的經(jīng)營(yíng)來(lái)說(shuō)至關(guān)重要,所以網(wǎng)站分析就顯得十分必要。要想了解用戶訪問(wèn)網(wǎng)站的情況,就要獲取全面而且詳細(xì)的用戶瀏覽網(wǎng)站的行為數(shù)據(jù),從大數(shù)據(jù)的角度來(lái)講海量信息使得網(wǎng)站分析更具洞察力,或許就會(huì)從不起眼的數(shù)據(jù)中挖掘到潛在的價(jià)值。 雖然現(xiàn)在已有很多第三方甚至免費(fèi)的網(wǎng)站分析工具,但實(shí)際應(yīng)用在網(wǎng)站中并不方便,如采用JavaScript頁(yè)面標(biāo)簽法的Google Analytics,必須修改頁(yè)面引入JavaScript代碼,而且捕獲某種用戶行為數(shù)據(jù)需要大量地修改頁(yè)面增加事件跟蹤的代碼,導(dǎo)致數(shù)據(jù)捕獲的工作量繁重、管理不便,而且對(duì)數(shù)據(jù)的統(tǒng)計(jì)也不具有實(shí)時(shí)性;而服務(wù)器日志的方式不能進(jìn)行事件跟蹤,還要過(guò)濾數(shù)據(jù)。本文的重點(diǎn)就是實(shí)現(xiàn)一個(gè)用戶行為數(shù)據(jù)收集統(tǒng)計(jì)系統(tǒng),采用JavaScript頁(yè)面標(biāo)簽法采集用戶行為數(shù)據(jù),但是不需手動(dòng)修改頁(yè)面,而是通過(guò)Nginx的模塊功能自動(dòng)將不同的JavaScript嵌入到各類頁(yè)面中;事件跟蹤的JavaScript代碼可以統(tǒng)一管理,方便維護(hù);數(shù)據(jù)收集服務(wù)器基于Netty,可以快速地處理大量的數(shù)據(jù);行為數(shù)據(jù)通過(guò)數(shù)據(jù)收集服務(wù)器發(fā)送至MetaQ消息中間件,因?yàn)楸鞠到y(tǒng)對(duì)行為數(shù)據(jù)的統(tǒng)計(jì)有兩種方式,分別是使用Hive實(shí)現(xiàn)定制化的周期報(bào)表和通過(guò)Storm實(shí)現(xiàn)實(shí)時(shí)統(tǒng)計(jì)并展示,所以這兩種統(tǒng)計(jì)方式可以獨(dú)立地從MetaQ消息中間件中拉取數(shù)據(jù)消息互不影響,因而將數(shù)據(jù)收集服務(wù)器從中解耦出來(lái)。 本人在項(xiàng)目中的工作主要包括用戶行為數(shù)據(jù)采集方法的研究、行為數(shù)據(jù)采集和數(shù)據(jù)收集存儲(chǔ)模塊的實(shí)現(xiàn),其中本人參與開(kāi)發(fā)的是通過(guò)Hive生成各類運(yùn)營(yíng)統(tǒng)計(jì)報(bào)表,故Storm實(shí)時(shí)統(tǒng)計(jì)的實(shí)現(xiàn)不在本文中介紹。目前本系統(tǒng)已經(jīng)為聯(lián)通網(wǎng)上商城和手機(jī)商城等平臺(tái)提供行為數(shù)據(jù)統(tǒng)計(jì)服務(wù),借助已有的任務(wù)調(diào)度系統(tǒng)每日或周期性地生成報(bào)表發(fā)送給相關(guān)人員,而且就現(xiàn)有情況來(lái)看HDFS上的數(shù)據(jù)存儲(chǔ)也基本達(dá)到了實(shí)時(shí)性,因此通過(guò)對(duì)行為數(shù)據(jù)的實(shí)時(shí)查詢可以監(jiān)控一些網(wǎng)站狀況,如出現(xiàn)異常可通過(guò)短信接口發(fā)送告警信息給開(kāi)發(fā)人員。
[Abstract]:With the advent of the Internet era, the Internet has been integrated into people's lives, and people have gradually accepted the consumption mode of online shopping. Let each e-commerce site invest more cost to attract users to create more revenue. Since it is an e-commerce site, so good website design. Customer satisfaction shopping experience is very important to the operation of the website, so website analysis is very necessary. To understand the user visit the site. Comprehensive and detailed user browsing behavior data is needed. From big data's point of view, vast amounts of information make website analysis more insightful, and may tap into potential value from unremarkable data. Although there are many third-party and even free website analysis tools, but the actual application in the site is not convenient. For Google Analytics using JavaScript page tags, the page must be modified to introduce JavaScript code. To capture certain user behavior data, it is necessary to modify the page to increase the code of event tracking, which leads to the heavy workload of data capture, the inconvenience of management, and the lack of real-time data statistics. However, the way of server log can not do event tracking, but also filter data. The focus of this paper is to implement a user behavior data collection and statistics system. JavaScript page tag method is used to collect user behavior data, but no manual modification of the page is required. Instead, it automatically embeds different JavaScript into all kinds of pages through the module function of Nginx. Event tracking JavaScript code can be unified management, easy to maintain; The data collection server is based on Netty. it can process a lot of data quickly. Behavior data is sent to the MetaQ messaging middleware through the data collection server, because there are two ways to calculate the behavior data in this system. Hive is used to realize customized periodic reports and real-time statistics and display through Storm. Therefore, these two statistical methods can independently pull data messages from MetaQ message middleware and decouple the data collection server from them. My work in the project mainly includes the research of user behavior data acquisition method, the implementation of behavior data acquisition and data collection and storage module. Among them, I participate in the development of Hive to generate all kinds of operational statistics reports. Therefore, the realization of Storm real-time statistics is not introduced in this paper. At present, this system has provided the behavior data statistics service for the platform such as Unicom online mall and mobile phone mall. With the help of the existing task scheduling system to generate or periodically generate reports to the relevant personnel, and the existing situation on the HDFS data storage is basically achieved real-time. Therefore, real-time query of behavior data can monitor the status of some websites, such as abnormal can send alarm information to developers through SMS interface.
【學(xué)位授予單位】:北京交通大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2015
【分類號(hào)】:TP311.52;TP393.092
【參考文獻(xiàn)】
相關(guān)期刊論文 前7條
1 李聳;房明;;基于Web的網(wǎng)站流量統(tǒng)計(jì)系統(tǒng)的設(shè)計(jì)[J];電腦知識(shí)與技術(shù);2008年05期
2 張宏升;;軟件架構(gòu)的非功能性需求指標(biāo)和區(qū)域化支持[J];電腦知識(shí)與技術(shù);2011年09期
3 向堅(jiān)持;劉相濱;徐選華;;基于用戶行為的Web使用挖掘數(shù)據(jù)采集技術(shù)研究[J];計(jì)算機(jī)與現(xiàn)代化;2007年12期
4 袁雅萍;;網(wǎng)站流量評(píng)估監(jiān)測(cè)系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[J];煤炭技術(shù);2009年10期
5 趙儀,趙熊,張成昱;專業(yè)網(wǎng)站的評(píng)價(jià)指標(biāo)分析[J];現(xiàn)代圖書情報(bào)技術(shù);2002年04期
6 馬亞娜,錢煥延,孫亞民;Cookie在web認(rèn)證中的應(yīng)用研究[J];小型微型計(jì)算機(jī)系統(tǒng);2004年02期
7 靳永超;吳懷谷;;基于Storm和Hadoop的大數(shù)據(jù)處理架構(gòu)的研究[J];現(xiàn)代計(jì)算機(jī)(專業(yè)版);2015年04期
,本文編號(hào):1472431
本文鏈接:http://www.lk138.cn/guanlilunwen/ydhl/1472431.html