基于物流信息的分類算法的研究及其應(yīng)用
本文選題:物流 + 數(shù)據(jù)挖掘; 參考:《北京郵電大學(xué)》2015年碩士論文
【摘要】:近年來,信息技術(shù)的發(fā)展推動了信息化在企業(yè)物流管理應(yīng)用中的興起,使得企業(yè)中存儲的數(shù)據(jù)呈現(xiàn)爆炸式增長。以數(shù)據(jù)作為資源,充分合理的利用數(shù)據(jù)挖掘技術(shù)深化企業(yè)物流管理,重點(diǎn)進(jìn)行基于物流信息的數(shù)據(jù)挖掘技術(shù)及其應(yīng)用的研究,可以幫助企業(yè)提高運(yùn)作效率、降低成本、及時決策,已成為提升企業(yè)競爭力的有效途徑。 本文以數(shù)據(jù)挖掘分類算法中的K近鄰算法為研究對象,在闡述了經(jīng)典K近鄰算法的核心思想與研究現(xiàn)狀的基礎(chǔ)上,總結(jié)出其兩方面不足:(1)傳統(tǒng)算法假設(shè)樣本的不同屬性對分類的重要性相同,導(dǎo)致不相關(guān)屬性引起分類誤判,影響算法準(zhǔn)確率。(2)傳統(tǒng)算法在選取待分類樣本的近鄰時需計(jì)算其與所有訓(xùn)練樣本的距離,計(jì)算開銷大且結(jié)果易受到噪聲樣本的影響,影響算法效率及準(zhǔn)確率。 針對以上兩方面不足,分別提出兩種改進(jìn)策略: (1)提出基于屬性約簡的改進(jìn)算法,體現(xiàn)不同屬性對分類結(jié)果的差異性。該算法利用信息熵計(jì)算條件屬性與決策屬性間的相關(guān)系數(shù),區(qū)分條件屬性在分類過程中的重要性,并通過調(diào)整相關(guān)系數(shù)的閾值適當(dāng)約簡樣本屬性。數(shù)值分析顯示,改進(jìn)算法可在一定程度上提升分類準(zhǔn)確率。 (2)提出基于聚類的樣本裁剪改進(jìn)算法,從而有效處理海量數(shù)據(jù)集,降低算法時間復(fù)雜度。此算法利用層次聚類限定K-means聚類的初始聚類中心,避免其隨機(jī)選擇影響聚類結(jié)果,同時引入K-means聚類修正層次聚類結(jié)果并從中選擇具有代表性的樣本集進(jìn)行分類測試。仿真實(shí)驗(yàn)證明,通過以上的樣本裁剪,改進(jìn)算法可在提高或保持分類準(zhǔn)確率的前提下,有效地降低分類器的計(jì)算量,提高分類效率。 最后,本文在上述研究工作的基礎(chǔ)上設(shè)計(jì)了一個改進(jìn)的K近鄰協(xié)同過濾推薦模型。該模型以北京市物流線路評分?jǐn)?shù)據(jù)為應(yīng)用對象,驗(yàn)證該模型在解決實(shí)際問題中的有效性和可行性。實(shí)驗(yàn)證明,改進(jìn)算法推薦結(jié)果準(zhǔn)確率顯著提高,通過該模型能夠幫助客戶從大量專業(yè)信息中快速找到適合的物流公司,具有實(shí)際應(yīng)用性。
[Abstract]:In recent years, the development of information technology has promoted the rise of information technology in the application of enterprise logistics management, making the data stored in the enterprise explosive growth. Taking data as the resource, making full and reasonable use of data mining technology to deepen enterprise logistics management, focusing on the research of data mining technology and its application based on logistics information, can help enterprises improve their operational efficiency and reduce their costs. Timely decision-making has become an effective way to enhance the competitiveness of enterprises. In this paper, the K-nearest neighbor algorithm in the classification algorithm of data mining is taken as the research object, and the core idea and research status of the classical K-nearest neighbor algorithm are expounded. The main conclusions are as follows: (1) the traditional algorithm assumes that the different attributes of the samples are of the same importance to the classification, which leads to the classification misjudgment caused by the unrelated attributes. (2) the traditional algorithm needs to calculate the distance between the nearest neighbor of the sample to be classified and all the training samples. The computation cost is large and the results are easily affected by the noise samples, which affects the efficiency and accuracy of the algorithm. In view of the above two shortcomings, two improved strategies are proposed: (1) an improved algorithm based on attribute reduction is proposed to reflect the difference of classification results between different attributes. The algorithm uses information entropy to calculate the correlation coefficients between conditional attributes and decision attributes to distinguish the importance of conditional attributes in the classification process and to reduce the sample attributes appropriately by adjusting the threshold of correlation coefficients. Numerical analysis shows that the improved algorithm can improve the classification accuracy to some extent. (2) an improved algorithm of sample clipping based on clustering is proposed to deal with massive data sets effectively and reduce the time complexity of the algorithm. This algorithm uses hierarchical clustering to define the initial clustering center of K-means clustering to avoid its random selection to affect the clustering results. At the same time, K-means clustering is introduced to modify the hierarchical clustering results and representative sample sets are selected for classification test. The simulation results show that the improved algorithm can effectively reduce the amount of computation and improve the classification efficiency on the premise of improving or maintaining the accuracy of classification. Finally, an improved K-nearest neighbor collaborative filtering recommendation model is designed based on the above work. The model is applied to the Beijing logistics line scoring data to verify the effectiveness and feasibility of the model in solving practical problems. The experimental results show that the accuracy of the improved recommendation algorithm is significantly improved and the model can help customers quickly find the suitable logistics company from a large number of professional information and it has practical application.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2015
【分類號】:TP311.13
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 張華娣;;貝葉斯和SVM在物流客戶流失分析中的應(yīng)用[J];重慶工學(xué)院學(xué)報(bào)(自然科學(xué)版);2009年07期
2 李蓉 ,葉世偉 ,史忠植;SVM-KNN分類器——一種提高SVM分類精度的新方法[J];電子學(xué)報(bào);2002年05期
3 周彥利;周創(chuàng)明;王曉丹;;基于核的K近鄰法[J];航空計(jì)算技術(shù);2006年05期
4 劉向東,陳兆乾;一種快速支持向量機(jī)分類算法的研究[J];計(jì)算機(jī)研究與發(fā)展;2004年08期
5 張玲珠;周忠眉;;結(jié)合屬性值貢獻(xiàn)度與平均相似度的KNN改進(jìn)算法[J];計(jì)算機(jī)工程與應(yīng)用;2010年18期
6 鄧維斌;王國胤;王燕;;基于Rough Set的加權(quán)樸素貝葉斯分類算法[J];計(jì)算機(jī)科學(xué);2007年02期
7 王國胤,于洪,楊大春;基于條件信息熵的決策表約簡[J];計(jì)算機(jī)學(xué)報(bào);2002年07期
8 李紅蓮,王春花,袁保宗;一種改進(jìn)的支持向量機(jī)NN-SVM[J];計(jì)算機(jī)學(xué)報(bào);2003年08期
9 李紅蓮,王春花,袁保宗,朱占輝;針對大規(guī)模訓(xùn)練集的支持向量機(jī)的學(xué)習(xí)策略[J];計(jì)算機(jī)學(xué)報(bào);2004年05期
10 黃創(chuàng)光;印鑒;汪靜;劉玉葆;王甲海;;不確定近鄰的協(xié)同過濾推薦算法[J];計(jì)算機(jī)學(xué)報(bào);2010年08期
,本文編號:2112814
本文鏈接:http://www.lk138.cn/guanlilunwen/wuliuguanlilunwen/2112814.html