Article (Scientific journals)
On the Synchronization Bottleneck of OpenStack Swift-like Cloud Storage Systems
Ruan, Mingkang; Titcheu Chekam, Thierry; Zhai, Ennan et al.
2018In IEEE Transactions on Parallel and Distributed Systems, PP (99), p. 1-1
Peer Reviewed verified by ORBi
 

Files


Full Text
lightsync-TPDS.pdf
Author postprint (1.39 MB)
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
Cloud computing;Delays;Electronic mail;Open source software;Protocols;Reliability;Synchronization;Cloud storage;OpenStack Swift;object synchronization;performance bottleneck
Abstract :
[en] As one type of the most popular cloud storage services, OpenStack Swift and its follow-up systems replicate each object across multiple storage nodes and leverage object sync protocols to achieve high reliability and eventual consistency. The performance of object sync protocols heavily relies on two key parameters: r (number of replicas for each object) and n (number of objects hosted by each storage node). In existing tutorials and demos, the configurations are usually r = 3 and n < 1000 by default, and the sync process seems to perform well. However, we discover in data-intensive scenarios, e.g., when r > 3 and n >> 1000, the sync process is significantly delayed and produces massive network overhead, referred to as the sync bottleneck problem. By reviewing the source code of OpenStack Swift, we find that its object sync protocol utilizes a fairly simple and network-intensive approach to check the consistency among replicas of objects. Hence in a sync round, the number of exchanged hash values per node is Theta(n x r). To tackle the problem, we propose a lightweight and practical object sync protocol, LightSync, which not only remarkably reduces the sync overhead, but also preserves high reliability and eventual consistency. LightSync derives this capability from three novel building blocks: 1) Hashing of Hashes, which aggregates all the h hash values of each data partition into a single but representative hash value with the Merkle tree; 2) Circular Hash Checking, which checks the consistency of different partition replicas by only sending the aggregated hash value to the clockwise neighbor; and 3) Failed Neighbor Handling, which properly detects and handles node failures with moderate overhead to effectively strengthen the robustness of LightSync. The design of LightSync offers provable guarantee on reducing the per-node network overhead from Theta(n x r) to Theta(n/h). Furthermore, we have implemented LightSync as an open-source patch and adopted it to OpenStack Swift, thus reducing the sync delay by up to 879x and the network overhead by up to 47.5x.
Disciplines :
Computer science
Author, co-author :
Ruan, Mingkang;  Tsinghua University > School of Software
Titcheu Chekam, Thierry ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
Zhai, Ennan;  Yale University > Computer Science
Li, Zhenhua;  Tsinghua University > School of Software
Liu, Yao;  SUNY Binghamton > Computer Science
E, Jinlong;  Tsinghua University > Computer Science and Technology
Cui, Yong;  Network Institute, Beijing > Department of Computer Science and Technology
Xu, Hong;  City University of Hong Kong > Computer Science
External co-authors :
yes
Language :
English
Title :
On the Synchronization Bottleneck of OpenStack Swift-like Cloud Storage Systems
Publication date :
27 February 2018
Journal title :
IEEE Transactions on Parallel and Distributed Systems
ISSN :
1045-9219
eISSN :
1558-2183
Publisher :
IEEE Xplore
Volume :
PP
Issue :
99
Pages :
1-1
Peer reviewed :
Peer Reviewed verified by ORBi
Focus Area :
Security, Reliability and Trust
Available on ORBilu :
since 05 April 2018

Statistics


Number of views
116 (7 by Unilu)
Number of downloads
395 (4 by Unilu)

Scopus citations®
 
6
Scopus citations®
without self-citations
3
WoS citations
 
4

Bibliography


Similar publications



Contact ORBilu