search query: @keyword hajautettu laskenta / total: 9
reference: 3 / 9
« previous | next »
Author:Päällysaho, Antti
Title:Data warehouse and reporting tools based on an open source distributed file system
Avoimeen lähdekoodiin perustuvaan hajautettuun levyjärjestelmään perustuvat datatavaratalo- ja raportointityökalut
Publication type:Master's thesis
Publication year:2010
Pages:(10) + 66 s. + liitt.      Language:   eng
Department/School:Informaatio- ja luonnontieteiden tiedekunta
Main subject:Ohjelmistotekniikka   (T-106)
Supervisor:Saikkonen, Heikki
Instructor:Tikkala, Juho
OEVS:
Electronic archive copy is available via Aalto Thesis Database.
Instructions

Reading digital theses in the closed network of the Aalto University Harald Herlin Learning Centre

In the closed network of Learning Centre you can read digital and digitized theses not available in the open network.

The Learning Centre contact details and opening hours: https://learningcentre.aalto.fi/en/harald-herlin-learning-centre/

You can read theses on the Learning Centre customer computers, which are available on all floors.

Logging on to the customer computers

  • Aalto University staff members log on to the customer computer using the Aalto username and password.
  • Other customers log on using a shared username and password.

Opening a thesis

  • On the desktop of the customer computers, you will find an icon titled:

    Aalto Thesis Database

  • Click on the icon to search for and open the thesis you are looking for from Aaltodoc database. You can find the thesis file by clicking the link on the OEV or OEVS field.

Reading the thesis

  • You can either print the thesis or read it on the customer computer screen.
  • You cannot save the thesis file on a flash drive or email it.
  • You cannot copy text or images from the file.
  • You cannot edit the file.

Printing the thesis

  • You can print the thesis for your personal study or research use.
  • Aalto University students and staff members may print black-and-white prints on the PrintingPoint devices when using the computer with personal Aalto username and password. Color printing is possible using the printer u90203-psc3, which is located near the customer service. Color printing is subject to a charge to Aalto University students and staff members.
  • Other customers can use the printer u90203-psc3. All printing is subject to a charge to non-University members.
Location:P1 Ark Aalto  8657   | Archive
Keywords:thesis
Hadoop
MapReduce
distributed computing
reporting
distributed file system
diplomityö
hajautettu laskenta
raportointi
hajautettu levyjärjestelmä
Abstract (eng): The growth of the size of hard disks and growth of the speed of the network connecting the computers the data that could be gathered has grown exponentially.
This has also made it more difficult to find the relevant information from this huge amount of data and one computer might not be enough for the job.

The subject of this thesis is to find what options we have for analyzing this data distributable and to try to find a solution that can distribute the data and computing power across a network of computers.
The solution should be based on open source tools so we can examine the tools more closely if needed.
Also we should be able to run the system on current commodity server hardware that is connected with standard Ethernet with speed of 100 Mbit/s.

Finally it was decided to use MapReduce programming model to solve the problem and it was decided to used Hadoop framework as MapReduce implementation.
After selection of tools a prototype was designed and implemented.
After completion of the prototype performance of the prototype was tested.
The prototype was tested for performance as well as how easy the system was to expand.

The prototype performed well as long as data was distributed well enough across the network.
Abstract (fin): Nopeiden yhteyksien ja kovalevyjen kasvun ja halpenemisen myötä järjestelmiin kerätyn tiedon määrä on kasvanut räjähdysmäisesti.
Tämän myötä tärkeän tiedon analysointi kaikesta tiedosta on muuttunut vaikeammaksi ja välttämättä yksi tietokone ei riitä tähän.

Tämän diplomityön aiheena on tutkia mitä vaihtoehtoja on hajautetulle tiedon analysoimiselle ja yrittää löytää ja toteuttaa ratkaisu joka pystyy hajauttamaan laskennan ja levynkäytön useamman koneen kesken.
Työkalun tulisi perustua avoimeen lähdekoodiin, jotta sen toimintaa voidaan tarvittaessa tutkia tarkemmin.
Järjestelmän tulisi myös toimia koneilla, jotka edustavat pienten tai keskisuurten palvelimien keskikastia ilman mitään erikoisuuksia.

Loppujen lopuksi ongelma päätettiin ratkaista MapReduce ohjelmointimallia käyttävällä järjestelmällä ja sen toteutukseen valittiin Hadoop.
Hadoopin avulla saatiin rakennettua raportointijärjestelmä ja sen valmistuttua sen tehokkuutta ja laajennettavuutta testattiin.

Lopputuloksena saatiin tehtyä järjestelmä, jota on helppo laajentaa, kunhan huolehditaan datan tasaisesta jakautumisesta järjestelmään.
ED:2010-08-31
INSSI record number: 40342
+ add basket
« previous | next »
INSSI