面向圖像內(nèi)容檢索的卷積神經(jīng)網(wǎng)絡(luò)
發(fā)布時(shí)間:2018-06-17 08:47
本文選題:卷積神經(jīng)網(wǎng)絡(luò) + 圖像分類 ; 參考:《杭州電子科技大學(xué)》2017年碩士論文
【摘要】:圖像分類以及檢索一直是圖像領(lǐng)域的經(jīng)典問題,隨著移動(dòng)互聯(lián)網(wǎng)的快速發(fā)展,圖片信息量也呈爆炸式增長,對(duì)海量圖片信息的分類已經(jīng)成為一個(gè)研究熱點(diǎn)。傳統(tǒng)的圖像分類方法針對(duì)特定的圖像由人工去設(shè)計(jì)特征,其魯棒性較差,而且需要豐富的先驗(yàn)知識(shí)。卷積神經(jīng)網(wǎng)絡(luò)方法則在該領(lǐng)域取得了重大突破,它可以自動(dòng)從海量圖片中學(xué)習(xí)到屬于原始圖像的本質(zhì)特征進(jìn)行分類,相比傳統(tǒng)方法具有更好地識(shí)別率和實(shí)用性。卷積神經(jīng)網(wǎng)絡(luò)模擬人的視覺系統(tǒng),將特征的提取過程分為從低到高多個(gè)層次,以網(wǎng)絡(luò)深度獲得高度抽象特征,它直接將圖片作為網(wǎng)絡(luò)的輸入,并且利用局部感受野、權(quán)值共享和子采樣技術(shù)減少網(wǎng)絡(luò)參數(shù)數(shù)量,從而避免權(quán)值數(shù)量過多導(dǎo)致過擬合,也使網(wǎng)絡(luò)具有一定程度上的平移、旋轉(zhuǎn)和扭曲不變性。目前,卷積神經(jīng)網(wǎng)絡(luò)已廣泛應(yīng)用于圖像檢索,其識(shí)別率和實(shí)用性均優(yōu)于傳統(tǒng)的分類方法,因此對(duì)卷積神經(jīng)網(wǎng)絡(luò)在圖像內(nèi)容檢索上應(yīng)用的研究具有十分重要的意義。本文主要從實(shí)際應(yīng)用和網(wǎng)絡(luò)改進(jìn)兩方面進(jìn)行研究,論文的主要工作如下:(1)針對(duì)CNN網(wǎng)絡(luò)模型設(shè)計(jì)的過程中,各參數(shù)如何選擇的問題,通過調(diào)整CNN中卷積核的個(gè)數(shù)和大小、采樣層的搭配方式以及激活函數(shù)進(jìn)行對(duì)比實(shí)驗(yàn),發(fā)現(xiàn)在增加卷積核個(gè)數(shù)、減小核尺寸、使用Relu激活函數(shù)、第一個(gè)采樣層使用最大值采樣這些情況下,CNN在MNIST和CIFAR-10數(shù)據(jù)庫上的性能更好。(2)針對(duì)古玩圖片數(shù)據(jù)集的分類,提出一種圖片大小不一情況下數(shù)據(jù)預(yù)處理的方法,解決圖片目標(biāo)在格式統(tǒng)一時(shí)發(fā)生形變的問題;提出一種目標(biāo)與背景分離后再輸入到CNN的方法,并在古玩數(shù)據(jù)集進(jìn)行實(shí)驗(yàn)驗(yàn)證該方法所用的CNN相比圖片直接輸入CNN,其網(wǎng)絡(luò)結(jié)構(gòu)更簡單,識(shí)別率更高;通過實(shí)驗(yàn)驗(yàn)證CNN在圖片包含多目標(biāo)的情況下仍然具有優(yōu)秀的分類性能;針對(duì)整個(gè)古玩數(shù)據(jù)集各類別樣本數(shù)量不平衡的情況,提出CNN結(jié)合HOG+SVM的方法進(jìn)行分類,并通過實(shí)驗(yàn)證明該方法比直接利用CNN分類的識(shí)別率要高。(3)針對(duì)CNN中常用的采樣方式各有優(yōu)缺點(diǎn)的情況,提出一種在采樣層分別進(jìn)行最大值采樣和均值采樣的網(wǎng)絡(luò)模型(并行采樣模型),實(shí)驗(yàn)驗(yàn)證該模型相比傳統(tǒng)CNN泛化性能更好;另外,提出一種對(duì)CNN進(jìn)行預(yù)訓(xùn)練,使網(wǎng)絡(luò)訓(xùn)練時(shí)可以剔除噪聲樣本的方法,解決在訓(xùn)練樣本中有噪聲時(shí)直接訓(xùn)練網(wǎng)絡(luò)會(huì)無法收斂的問題。
[Abstract]:Image classification and retrieval is a classic problem in the field of image. With the rapid development of mobile Internet, the amount of image information is also explosive growth, the classification of mass image information has become a research hotspot. Traditional image classification methods design features manually for a particular image, which is less robust and requires abundant prior knowledge. The convolutional neural network method has made a great breakthrough in this field. It can automatically learn the essential features of the original image from the massive images for classification. Compared with the traditional method, it has better recognition rate and practicability. Convolution neural network simulates human visual system, classifies the feature extraction process from low to high levels, obtains highly abstract features by network depth. It directly takes pictures as the input of the network, and uses the local receptive field. The techniques of weight sharing and subsampling reduce the number of network parameters so as to avoid overfitting caused by too many weights and make the network have the invariance of translation rotation and distortion to a certain extent. At present, convolution neural network has been widely used in image retrieval, its recognition rate and practicability are better than traditional classification methods, so it is very important to study the application of convolution neural network in image content retrieval. The main work of this paper is as follows: 1) aiming at the problem of how to select the parameters in the process of CNN network model design, we adjust the number and size of convolutional cores in CNN. The collocation of the sampling layer and the activation function are compared. It is found that when increasing the number of convolution kernels and reducing the size of the core, the Relu activation function is used. The first sampling layer uses maximum sampling in these cases CNN performs better on MNIST and CIFAR-10 databases. (2) aiming at the classification of antiques image data sets, a method of data preprocessing with different image sizes is proposed. In order to solve the problem that the image object is deformed when the format is unified, a method is proposed to separate the target from the background and then input it to CNN. Compared with CNN-based images, CNN has simpler network structure and higher recognition rate, and it has excellent classification performance in the case of multi-target images. In view of the imbalance in the number of different types of samples in the whole antique data set, a CNN combined with hog SVM method is proposed for classification. It is proved by experiments that the recognition rate of this method is higher than that of using CNN classification directly.) the sampling methods commonly used in CNN have their own advantages and disadvantages. In this paper, a network model of maximum sampling and mean sampling in sampling layer is proposed. The experiment results show that the proposed model has better generalization performance than traditional CNN, and a new network model is proposed to pretrain CNN. The method of eliminating noise samples can be used in network training to solve the problem that the direct training network can not converge when there is noise in the training samples.
【學(xué)位授予單位】:杭州電子科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP391.41;TP183
【參考文獻(xiàn)】
相關(guān)期刊論文 前3條
1 舒文娉;劉全香;;基于支持向量機(jī)的印品缺陷分類方法[J];包裝工程;2014年23期
2 應(yīng)義斌;桂江生;饒秀勤;;基于Zernike矩的水果形狀分類[J];江蘇大學(xué)學(xué)報(bào)(自然科學(xué)版);2007年01期
3 李向陽,莊越挺,潘云鶴;基于內(nèi)容的圖像檢索技術(shù)與系統(tǒng)[J];計(jì)算機(jī)研究與發(fā)展;2001年03期
,本文編號(hào):2030444
本文鏈接:http://www.lk138.cn/kejilunwen/zidonghuakongzhilunwen/2030444.html
最近更新
教材專著