search query: @supervisor Alku, Paavo / total: 39
reference: 2 / 39
« previous | next »
Author:Rangslang, Rijuban
Title:Segment phoneme classification from speech under noisy conditions: Using amplitude-frequency modulation based two-dimensional auto-regressive features with deep neural networks
Publication type:Master's thesis
Publication year:2016
Pages:(6) + 64      Language:   eng
Department/School:Sähkötekniikan korkeakoulu
Main subject:Signal Processing   (S3013)
Supervisor:Alku, Paavo
Instructor:Gowda, Dhananjaya
Electronic version URL: http://urn.fi/URN:NBN:fi:aalto-201608263089
Location:P1 Ark Aalto  4434   | Archive
Keywords:robust speech recognition
AM-FM based features
segment phoneme classification
deep neural networks
Abstract (eng):This thesis investigates at the acoustic-phonetic level the noise robustness of features derived using the AM-FM analysis of speech signals.
The analysis on the noise robustness of these features is done using various neural network models and is based on the segment classification of phonemes.
This analysis is also extended and the robustness of the AM-FM based features is compared under similar noise conditions with the traditional features such as the Mel-frequency cepstral coefficients(MFCC).

We begin with an important aspect of segment phoneme classification experiments which is the study of architectural and training strategies of the various neural network models used.
The results of these experiments showed that there is a difference in the training pattern adopted by the various neural network models.
Before over-fitting, models that undergo pre-training are seen to train for many epochs more than their opposite models that do not undergo pre-training.
Taking this difference in training pattern into perspective and based on phoneme classification rate the Gaussian restricted Boltzmann machine and the single layer perceptron are selected as the best performing model of the two groups, respectively.

Using the two best performing models for classification, segment phoneme classification experiments under different noise conditions are performed for both the AM-FM based and traditional features.
The experiments showed that AM-FM based frequency domain linear prediction features with or without feature compensation are more robust in the classification of 61 phonemes under white noise and 0 $dB$ signal-to-noise ratio(SNR) conditions compared to the traditional features.
However, when the phonemes are folded to 39 phonemes, the results are ambiguous under all noise conditions and there is no unanimous conclusion as to which feature is most robust.
ED:2016-09-04
INSSI record number: 54301
+ add basket
« previous | next »
INSSI