haku: @keyword speech recognition / yhteensä: 24
viite: 3 / 24
Tekijä:Mirzaei, Saeideh
Työn nimi:Improving Accuracy in Automatic Speech Recognition Systems by Model Adaptation Techniques
Julkaisutyyppi:Diplomityö
Julkaisuvuosi:2015
Sivut:51 s. + liitt. 4      Kieli:   eng
Koulu/Laitos/Osasto:Sähkötekniikan korkeakoulu
Oppiaine:Signal Processing   (S3013)
Valvoja:Kurimo, Mikko
Ohjaaja:Milhorat, Pierrick ; Boudy, Jéróme
Elektroninen julkaisu: http://urn.fi/URN:NBN:fi:aalto-201506303498
Sijainti:P1 Ark Aalto  3149   | Arkisto
Avainsanat:speech recognition
speaker adaptation
language model adaptation
hidden markov models
Tiivistelmä (eng):The performance of the speech recognition systems to translate voice to text is still an issue in large vocabulary continuous speech recognition tasks.
The major source of poor performance of such systems is the mismatch between the training conditions and the testing conditions.
ASR systems have shown to perform better when trained for a specific user and application.
As training models needs a large amount of data, both for acoustic model and language model, adaptation methods are used to achieve gain in recognition accuracy with the basic system, while needing much less data to adjust parameters.
The acoustic and language models are adapted to make ASR systems more speaker dependent, noise robust and context dependent.
In the first problem, the goal is to reduce the mismatch between the user's vocal characteristics and the generic acoustic model.
This along with adaptation to the noise concern the acoustic model specifications.
Moreover, we use language model adaptation techniques to change the parameters (combination probabilities) in the grammar model, hence giving more weights to the word sequences that are more relevant to the task in progress.

In this work an unsupervised acoustic model adaptation has been implemented using linear VTLN and constrained MLLR.
VTLN changes the speaker's formant positions and MLLR deals with model parameters in feature space.
We show the overall performance increases by using either of these two methods.
The relative WER reduction by using cMLLR was 9.44\%.
In language model adaptation, a linear interpolation of the generic and specific models has been implemented.
The perplexity of the adapted language model was relatively improved by 14.47\% compared to the generic model.
The perplexity of the model approximately defines the performance of the ASR system though not being directly proportional to it.
Both acoustic and language model adaptation revealed to improve the performance of the ASR system.
ED:2015-08-16
INSSI tietueen numero: 51906
+ lisää koriin
INSSI