haku: @keyword big data / yhteensä: 34
viite: 6 / 34
Tekijä:Salem, Farouk
Työn nimi:Comparative Analysis of Big Data Stream Processing Systems
Julkaisutyyppi:Diplomityö
Julkaisuvuosi:2016
Sivut:(9) + 77      Kieli:   eng
Koulu/Laitos/Osasto:Perustieteiden korkeakoulu
Oppiaine:Mobile Computing, Services and Security   (SCI3045)
Valvoja:Heljanko, Keijo
Ohjaaja:Latif, Khalid
Elektroninen julkaisu: http://urn.fi/URN:NBN:fi:aalto-201608263033
Sijainti:P1 Ark Aalto  4361   | Arkisto
Avainsanat:big data
stream processing frameworks
Apache Spark
apache flink
apache beam
lambda architecture
Tiivistelmä (eng):In recent years, Big Data has become a prominent paradigm in the field of distributed systems.
These systems distribute data storage and processing power across a cluster of computers.
Such systems need methodologies to store and process Big Data in a distributed manner.
There are two models for Big Data processing: batch processing and stream processing.
The batch processing model is able to produce accurate results but with large latency.
Many systems, such as billing systems, require Big Data to be processed with low latency because of real-time constraints.
Therefore, the batch processing model is unable to fulfill the requirements of real-time systems.

The stream processing model tries to address the batch processing limitations by producing results with low latency.
Unlike the batch processing model, the stream processing model processes the recent data instead of all the produced data to fulfill the time limitations of real-time systems.
The subsequent model divides a stream of records into data windows.
Each data window contains a group of records to be processed together.
Records can be collected based on the time of arrival, the time of creation, or the user sessions.
However, in some systems, processing the recent data depends on the already processed data.

There are many frameworks that try to process Big Data in real time such as Apache Spark, Apache Flink, and Apache Beam.
The main purpose of this research is to give a clear and fair comparison among the mentioned frameworks from different perspectives such as the latency, processing guarantees, the accuracy of results, fault tolerance, and the available functionalities of each framework.
ED:2016-09-04
INSSI tietueen numero: 54245
+ lisää koriin
INSSI