Metody doménové adaptace pro rozpoznávání řeči
Thesis title in Czech: | Metody doménové adaptace pro rozpoznávání řeči |
---|---|
Thesis title in English: | Methods of Domain Adaptation for Speech Recognition |
Academic year of topic announcement: | 2019/2020 |
Thesis type: | diploma thesis |
Thesis language: | čeština |
Department: | Institute of Formal and Applied Linguistics (32-UFAL) |
Supervisor: | doc. RNDr. Ondřej Bojar, Ph.D. |
Author: | hidden - assigned and confirmed by the Study Dept. |
Date of registration: | 20.02.2020 |
Date of assignment: | 16.03.2020 |
Confirmed by Study dept. on: | 01.06.2020 |
Date and time of defence: | 08.07.2020 09:00 |
Date of electronic submission: | 28.05.2020 |
Date of submission of printed version: | 28.05.2020 |
Date of proceeded defence: | 08.07.2020 |
Opponents: | Mgr. et Mgr. Ondřej Dušek, Ph.D. |
Guidelines |
The quality of automatic speech recognition (ASR) critically depends on the match of the test and training data. Domain adaptation techniques are used to adjust a more general system to improve its performance for a particular situation.
The goal of the thesis is to explore method of domain adaptation for speech recognition. The thesis should consider adaptation at various levels, starting with adaptation to a given subject area (e.g. economics vs. computational linguistics) up to adaptation to individual talks given by a known speaker on a known topic. An inherent part of the thesis is the empirical evaluation of the discussed or proposed methods. Specifically, the work should start with creating a baseline ASR system for spoken Czech and then carry out a series of domain adaptation experiments at various levels. The quality of the system will be evaluated automatically using the standard WER (word error rate) measure. |
References |
Mohri, M., Pereira, F., & Riley, M. (2008). Speech recognition with weighted finite-state transducers.
In *Springer Handbook of Speech Processing* (pp. 559-584). Springer, Berlin, Heidelberg. Young, S. et al. (2006). The HTK book. *Cambridge university engineering department*, *3*, 75. Goodman, J. (2001). A bit of progress in language modeling. *arXiv preprint cs/0108005*. Peddinti, V., Povey, D., & Khudanpur, S. (2015). A time delay neural network architecture for efficient modeling of long temporal contexts. In *Sixteenth Annual Conference of the International Speech Communication Association*. Goodfellow, I., Bengio, Y., & Courville, A. (2016). *Deep learning*. MIT press. |