haku: @keyword computer vision / yhteensä: 23
viite: 6 / 23
Tekijä:Perello Nieto, Miquel
Työn nimi:Merging chrominance and luminance in early, medium, and late fusion using Convolutional Neural Networks
Julkaisutyyppi:Diplomityö
Julkaisuvuosi:2015
Sivut:xxiv + 166      Kieli:   eng
Koulu/Laitos/Osasto:Perustieteiden korkeakoulu
Oppiaine:Machine Learning and Data Mining   (SCI3015)
Valvoja:Raiko, Tapani
Ohjaaja:Koskela, Markus ; Gavaldá Mestre, Ricard
Elektroninen julkaisu: http://urn.fi/URN:NBN:fi:aalto-201506303564
Sijainti:P1 Ark Aalto  2889   | Arkisto
Avainsanat:machine learning
computer vision
image classification
artificial neural network
convolutional neural network
connectionism
Tiivistelmä (eng):The field of Machine Learning has received extensive attention in recent years.
More particularly, computer vision problems have got abundant consideration as the use of images and pictures in our daily routines is growing.

The classification of images is one of the most important tasks that can be used to organize, store, retrieve, and explain pictures.
In order to do that, researchers have been designing algorithms that automatically detect objects in images.
During last decades, the common approach has been to create sets of features - manually designed - that could be exploited by image classification algorithms.
More recently, researchers designed algorithms that automatically learn these sets of features, surpassing state-of-the-art performances.

However, learning optimal sets of features is computationally expensive and it can be relaxed by adding prior knowledge about the task, improving and accelerating the learning phase.
Furthermore, with problems with a large feature space the complexity of the models need to be reduced to make it computationally tractable (e.g. the recognition of human actions in videos).

Consequently, we propose to use multimodal learning techniques to reduce the complexity of the learning phase in Artificial Neural Networks by incorporating prior knowledge about the connectivity of the network.
Furthermore, we analyze state-of-the-art models for image classification and propose new architectures that can learn a locally optimal set of features in an easier and faster manner.

In this thesis, we demonstrate that merging the luminance and the chrominance part of the images using multimodal learning techniques can improve the acquisition of good visual set of features.
We compare the validation accuracy of several models and we demonstrate that our approach outperforms the basic model with statistically significant results.
ED:2015-08-16
INSSI tietueen numero: 51980
+ lisää koriin
INSSI