ABSTRACT
Cloud computing provides services using virtualized resources through Internet on pay per use basis. These
services are delivered from millions of data centers which are connected with each other. Cloud system consists of
commodity machines. The client data is stored on these machines. Probability of hardware failure and data
corruption of these low performance machines are high. For fault tolerance and improving the reliability of the
cloud system the data is replicated to multiple systems.
Hadoop Distributed File System (HDFS) is used for distributed storage in cloud system. The data is stored in the
form of fixed-size blocks i.e. 64MB. The data stored in HDFS is replicated on multiple systems for improving the
reliability of the cloud system. Block replica placement algorithm is used in HDFS for replicating the data block.
In this algorithm, QoS parameter for replicating the data block is not specified between client and service
provider in the form of service level agreement.
In this paper, an algorithm QoS-Aware Data Replication in HDFS is suggested which considers the QoS
parameter for replicating the data block. The QoS parameter considered is expected replication time of
application. The block of data is replicated to remote rack DataNodes which satisfies replication time requirement
of application. This algorithm reduces the replication cost as compared to existing algorithm thus, improving the
reliability and performance of system.
Keywords: - Cloud computing; quality of service; data replication; Hadoop distributed file system; replication cost