分布式存儲系統(tǒng)中異步編碼的動態(tài)條帶構(gòu)建
發(fā)布時間:2018-11-08 16:09
【摘要】:為了在保證數(shù)據(jù)訪問性能的同時降低系統(tǒng)的冗余存儲開銷,分布式存儲系統(tǒng)通常會采用異步編碼技術(shù)。在新數(shù)據(jù)被寫入時,系統(tǒng)使用多副本機制對這些數(shù)據(jù)進行存儲,并在數(shù)據(jù)訪問變少后,在后臺將這些數(shù)據(jù)轉(zhuǎn)化為糾刪碼方式存儲。由于分布式系統(tǒng)通常采用隨機分布的數(shù)據(jù)塊放置方法,邏輯地址連續(xù)的數(shù)據(jù)塊通常會分散在系統(tǒng)的所有節(jié)點中。因此在執(zhí)行編碼操作時,編碼進程需要通過跨機架下載來獲取數(shù)據(jù)塊。而在編碼完成后,又需要跨機架的數(shù)據(jù)塊重新分布來保證數(shù)據(jù)的可靠性。這種方法即降低了異步編碼操作的執(zhí)行效率,也影響了系統(tǒng)中前臺任務(wù)進程的性能。為了提高異步編碼的執(zhí)行效率并降低其對前臺任務(wù)性能的影響,本文提出了一種新型的編碼條帶構(gòu)建方式,我們稱之為動態(tài)條帶構(gòu)建技術(shù)(Dynamic Stripe Con-structiom,DSC)。DSC根據(jù)當(dāng)前系統(tǒng)中數(shù)據(jù)塊的放置信息來組建編碼條帶。放入同一編碼條帶中的數(shù)據(jù)塊需要滿足以下兩種性質(zhì):(1)這些數(shù)據(jù)塊存在副本存儲于同一機架中,以保證在編碼時不會引起跨機架的數(shù)據(jù)塊下載;(2)這些數(shù)據(jù)塊存在副本分散在其他獨立的機架中,以保證編碼完成后不會引起跨機架的數(shù)據(jù)塊重新分布。為了在龐大的選擇空間中有效地組建編碼條帶,我們設(shè)計了一種管理數(shù)據(jù)塊放置信息的數(shù)據(jù)結(jié)構(gòu),并基于這一數(shù)據(jù)結(jié)構(gòu)提出了一種線性時間復(fù)雜度的動態(tài)條帶構(gòu)建算法。該算法可以以熱插拔的方式應(yīng)用于使用任何數(shù)據(jù)放置方式與糾刪碼配置的分布式集群。為了驗證動態(tài)條帶構(gòu)建技術(shù)的有效性,我們將DSC實現(xiàn)在HDFS系統(tǒng)上。在真實集群的測試實驗中,DSC可以顯著的提高異步編碼的執(zhí)行效率(實驗中最高改進可達81%),并降低其對前臺任務(wù)進程的影響。在系統(tǒng)集成的過程中,我們首先探討了異步編碼中節(jié)點上數(shù)據(jù)局部性與負載均衡的問題,隨后設(shè)計了文件間編碼與迭代編碼技術(shù)來優(yōu)化異步編碼在小文件與追加文件場景下的應(yīng)用。為了適應(yīng)分布式集群中不斷變化的數(shù)據(jù)訪問負載,我們還提出了一種將動態(tài)副本與糾刪碼結(jié)合的新型數(shù)據(jù)塊管理架構(gòu)。這種架構(gòu)模式使得我們可以對系統(tǒng)中的數(shù)據(jù)塊進行動態(tài)的管理,以在提高數(shù)據(jù)可靠性與訪問性能的同時最小化系統(tǒng)的存儲開銷。
[Abstract]:In order to ensure the performance of data access and reduce the redundant storage overhead, asynchronous coding is usually used in distributed storage systems. When the new data is written, the system uses multi-replica mechanism to store the data, and after the data access becomes less, the data is converted into erasure code storage in the background. Since distributed systems usually use randomly distributed data blocks, logical address blocks are usually scattered across all nodes of the system. Therefore, when performing encoding operations, the encoding process needs to obtain blocks of data through cross-rack downloads. After the coding is completed, the data blocks across the rack need to be redistributed to ensure the reliability of the data. This method not only reduces the efficiency of asynchronous coding operation, but also affects the performance of foreground task process in the system. In order to improve the efficiency of asynchronous coding and reduce its impact on the performance of foreground tasks, this paper proposes a new coding band construction method, which we call dynamic stripe construction technology (Dynamic Stripe Con-structiom,). DSC). DSC constructs coding bands based on the placement information of data blocks in the current system. The data blocks placed in the same coding strip need to satisfy the following two properties: (1) the data blocks are stored in the same rack in order to ensure that the data blocks across the frame will not be downloaded; (2) the existing copies of these blocks are scattered in other independent frames to ensure that the data blocks across the rack will not be redistributed after the coding is completed. In order to construct coding bands effectively in a large selection space, we design a data structure that manages the information placed in blocks of data. Based on this data structure, we propose a dynamic stripe construction algorithm with linear time complexity. The algorithm can be applied to distributed clusters using any data placement and erasure code configuration. In order to verify the effectiveness of dynamic stripe construction technology, we implement DSC on HDFS system. In the real cluster test, DSC can significantly improve the efficiency of asynchronous coding (up to 81% in the experiment) and reduce its impact on the foreground task process. In the process of system integration, we first discuss the problem of data locality and load balancing in asynchronous coding. Then we design inter-file coding and iterative coding techniques to optimize the application of asynchronous coding in small file and append file scenarios. In order to adapt to the changing data access load in distributed cluster, we also propose a new data block management architecture which combines dynamic replica with erasure code. This architecture pattern enables us to dynamically manage the data blocks in the system to minimize the storage overhead while improving the data reliability and access performance.
【學(xué)位授予單位】:中國科學(xué)技術(shù)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP333
[Abstract]:In order to ensure the performance of data access and reduce the redundant storage overhead, asynchronous coding is usually used in distributed storage systems. When the new data is written, the system uses multi-replica mechanism to store the data, and after the data access becomes less, the data is converted into erasure code storage in the background. Since distributed systems usually use randomly distributed data blocks, logical address blocks are usually scattered across all nodes of the system. Therefore, when performing encoding operations, the encoding process needs to obtain blocks of data through cross-rack downloads. After the coding is completed, the data blocks across the rack need to be redistributed to ensure the reliability of the data. This method not only reduces the efficiency of asynchronous coding operation, but also affects the performance of foreground task process in the system. In order to improve the efficiency of asynchronous coding and reduce its impact on the performance of foreground tasks, this paper proposes a new coding band construction method, which we call dynamic stripe construction technology (Dynamic Stripe Con-structiom,). DSC). DSC constructs coding bands based on the placement information of data blocks in the current system. The data blocks placed in the same coding strip need to satisfy the following two properties: (1) the data blocks are stored in the same rack in order to ensure that the data blocks across the frame will not be downloaded; (2) the existing copies of these blocks are scattered in other independent frames to ensure that the data blocks across the rack will not be redistributed after the coding is completed. In order to construct coding bands effectively in a large selection space, we design a data structure that manages the information placed in blocks of data. Based on this data structure, we propose a dynamic stripe construction algorithm with linear time complexity. The algorithm can be applied to distributed clusters using any data placement and erasure code configuration. In order to verify the effectiveness of dynamic stripe construction technology, we implement DSC on HDFS system. In the real cluster test, DSC can significantly improve the efficiency of asynchronous coding (up to 81% in the experiment) and reduce its impact on the foreground task process. In the process of system integration, we first discuss the problem of data locality and load balancing in asynchronous coding. Then we design inter-file coding and iterative coding techniques to optimize the application of asynchronous coding in small file and append file scenarios. In order to adapt to the changing data access load in distributed cluster, we also propose a new data block management architecture which combines dynamic replica with erasure code. This architecture pattern enables us to dynamically manage the data blocks in the system to minimize the storage overhead while improving the data reliability and access performance.
【學(xué)位授予單位】:中國科學(xué)技術(shù)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP333
【相似文獻】
相關(guān)期刊論文 前10條
1 ;廉價、高效、穩(wěn)定 微軟新一代分布式存儲系統(tǒng)[J];新電腦;2006年06期
2 何公明;張元濤;;面向數(shù)字媒體的高性能分布式存儲系統(tǒng)的研究與應(yīng)用[J];廣播電視信息;2009年10期
3 范劍波,郭建康;分布式存儲系統(tǒng)性能模型的建立與應(yīng)用[J];計算機工程與應(yīng)用;2001年13期
4 范劍波,徐利浩;分布式存儲系統(tǒng)可靠性的研究[J];計算機工程;2001年06期
5 吳英;謝廣軍;劉t,
本文編號:2318997
本文鏈接:http://www.lk138.cn/shoufeilunwen/xixikjs/2318997.html
最近更新
教材專著