haku: @supervisor Nurminen, Jukka / yhteensä: 45
viite: 4 / 45
Tekijä:Ansari, Aftab
Työn nimi:Evaluation of cloud based approaches to data quality management
Julkaisutyyppi:Diplomityö
Julkaisuvuosi:2016
Sivut:83      Kieli:   eng
Koulu/Laitos/Osasto:Perustieteiden korkeakoulu
Oppiaine:Service Design and Engineering   (IL3005)
Valvoja:Nurminen, Jukka
Ohjaaja:Moloney, Seamus
Elektroninen julkaisu: http://urn.fi/URN:NBN:fi:aalto-201602161378
Sijainti:P1 Ark Aalto  5683   | Arkisto
Avainsanat:data quality management
ETL
data cleaning
hive
hadoop
azure microsoft
Tiivistelmä (eng):Quality of data is critical for making data driven business decisions.
Enhancing the quality of data enables companies to make better decisions and prevent business losses.
Systems similar to Extract Transform and Load (ETL) are often used to clean and improve the quality of data.
Currently, businesses tend to collect a massive amount of customer data, store it in the cloud, and analyze the data to gain statistical inferences about their products, services, and customers.
Cheaper storage, constantly improving approaches to data privacy and security provided by cloud vendors, such as Microsoft Azure, Amazon Web Service, seem to be the key driving forces behind this process.

This thesis implements Azure Data Factory based ETL system that serves the purpose of data quality management in the Microsoft Azure Cloud platform.
In addition to Azure Data Factory, there are four other key components in the system: (1) Azure Storage for storing raw, and semi cleaned data; (2) HDInsight for processing raw and semi cleaned data using Hadoop clusters and Hive queries; (3) Azure ML Studio for processing raw and semi cleaned data using R scripts and other machine learning algorithms; (4) Azure SQL database for storing the cleaned data.
This thesis shows that using Azure Data factory as the core component offers many benefits because it helps in scheduling jobs, and monitoring the whole data transformation processes.
Thus, it makes data intake process more timely, guarantees data reliability, simplifies data auditing.
The developed system was tested and validated using sample raw data.
ED:2016-02-21
INSSI tietueen numero: 53170
+ lisää koriin
INSSI