Thesis (Selection of subject)Thesis (Selection of subject)(version: 381)
Thesis details
   Login via CAS
Metody doménové adaptace pro rozpoznávání řeči
Thesis title in Czech: Metody doménové adaptace pro rozpoznávání řeči
Thesis title in English: Methods of Domain Adaptation for Speech Recognition
Academic year of topic announcement: 2019/2020
Thesis type: diploma thesis
Thesis language: čeština
Department: Institute of Formal and Applied Linguistics (32-UFAL)
Supervisor: doc. RNDr. Ondřej Bojar, Ph.D.
Author: hidden - assigned and confirmed by the Study Dept.
Date of registration: 20.02.2020
Date of assignment: 16.03.2020
Confirmed by Study dept. on: 01.06.2020
Date and time of defence: 08.07.2020 09:00
Date of electronic submission:28.05.2020
Date of submission of printed version:28.05.2020
Date of proceeded defence: 08.07.2020
Opponents: Mgr. et Mgr. Ondřej Dušek, Ph.D.
 
 
 
Guidelines
The quality of automatic speech recognition (ASR) critically depends on the match of the test and training data. Domain adaptation techniques are used to adjust a more general system to improve its performance for a particular situation.

The goal of the thesis is to explore method of domain adaptation for speech recognition. The thesis should consider adaptation at various levels, starting with adaptation to a given subject area (e.g. economics vs. computational linguistics) up to adaptation to individual talks given by a known speaker on a known topic.

An inherent part of the thesis is the empirical evaluation of the discussed or proposed methods. Specifically, the work should start with creating a baseline ASR system for spoken Czech and then carry out a series of domain adaptation experiments at various levels. The quality of the system will be evaluated automatically using the standard WER (word error rate) measure.
References
Mohri, M., Pereira, F., & Riley, M. (2008). Speech recognition with weighted finite-state transducers.
In *Springer Handbook of Speech Processing* (pp. 559-584). Springer, Berlin, Heidelberg.

Young, S. et al. (2006). The HTK book. *Cambridge university engineering department*, *3*, 75.

Goodman, J. (2001). A bit of progress in language modeling. *arXiv preprint cs/0108005*.

Peddinti, V., Povey, D., & Khudanpur, S. (2015). A time delay neural network architecture for efficient modeling of long temporal contexts.
In *Sixteenth Annual Conference of the International Speech Communication Association*.

Goodfellow, I., Bengio, Y., & Courville, A. (2016). *Deep learning*.
MIT press.
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html