search query: @keyword HDFS / total: 3
reference: 1 / 3
« previous | next »
Author:Mukherjee, Alapan
Title:Benchmarking Hadoop performance on different distributed storage systems
Publication type:Master's thesis
Publication year:2015
Pages:110      Language:   eng
Department/School:Perustieteiden korkeakoulu
Main subject:Mobile Computing   (T-110)
Supervisor:Heljanko, Keijo
Instructor:Döngelci, Ridvan
Electronic version URL: http://urn.fi/URN:NBN:fi:aalto-201509184328
Location:P1 Ark Aalto  3020   | Archive
Keywords:Tachyon
HDFS
Ceph
benchmarks
Abstract (eng):Distributed storage systems have been in place for years, and have undergone significant changes in architecture to ensure reliable storage of data in a cost-effective manner.
With the demand for data increasing, there has been a shift from disk-centric to memory-centric computing - the focus is on saving data in memory rather than on the disk.
The primary motivation for this is the increased speed of data processing.
This could, however, mean a change in the approach to providing the necessary fault-tolerance - instead of data replication, other techniques may be considered.

One example of an in-memory distributed storage system is Tachyon.
Instead of replicating data files in memory, Tachyon provides fault-tolerance by maintaining a record of the operations needed to generate the data files.
These operations are replayed if the files are lost.
This approach is termed lineage.
Tachyon is already deployed by many well-known companies.

This thesis work compares the storage performance of Tachyon with that of the on-disk storage systems HDFS and Ceph.
After studying the architectures of well-known distributed storage systems, the major contribution of the work is to integrate Tachyon with Ceph as an underlayer storage system, and understand how this affects its performance, and how to tune Tachyon to extract maximum performance out of it.
ED:2015-09-27
INSSI record number: 52047
+ add basket
« previous | next »
INSSI