Metody doménové adaptace pro rozpoznávání řeči
Název práce v češtině: | Metody doménové adaptace pro rozpoznávání řeči |
---|---|
Název v anglickém jazyce: | Methods of Domain Adaptation for Speech Recognition |
Akademický rok vypsání: | 2019/2020 |
Typ práce: | diplomová práce |
Jazyk práce: | čeština |
Ústav: | Ústav formální a aplikované lingvistiky (32-UFAL) |
Vedoucí / školitel: | doc. RNDr. Ondřej Bojar, Ph.D. |
Řešitel: | skrytý![]() |
Datum přihlášení: | 20.02.2020 |
Datum zadání: | 16.03.2020 |
Datum potvrzení stud. oddělením: | 01.06.2020 |
Datum a čas obhajoby: | 08.07.2020 09:00 |
Datum odevzdání elektronické podoby: | 28.05.2020 |
Datum odevzdání tištěné podoby: | 28.05.2020 |
Datum proběhlé obhajoby: | 08.07.2020 |
Oponenti: | Mgr. et Mgr. Ondřej Dušek, Ph.D. |
Zásady pro vypracování |
The quality of automatic speech recognition (ASR) critically depends on the match of the test and training data. Domain adaptation techniques are used to adjust a more general system to improve its performance for a particular situation.
The goal of the thesis is to explore method of domain adaptation for speech recognition. The thesis should consider adaptation at various levels, starting with adaptation to a given subject area (e.g. economics vs. computational linguistics) up to adaptation to individual talks given by a known speaker on a known topic. An inherent part of the thesis is the empirical evaluation of the discussed or proposed methods. Specifically, the work should start with creating a baseline ASR system for spoken Czech and then carry out a series of domain adaptation experiments at various levels. The quality of the system will be evaluated automatically using the standard WER (word error rate) measure. |
Seznam odborné literatury |
Mohri, M., Pereira, F., & Riley, M. (2008). Speech recognition with weighted finite-state transducers.
In *Springer Handbook of Speech Processing* (pp. 559-584). Springer, Berlin, Heidelberg. Young, S. et al. (2006). The HTK book. *Cambridge university engineering department*, *3*, 75. Goodman, J. (2001). A bit of progress in language modeling. *arXiv preprint cs/0108005*. Peddinti, V., Povey, D., & Khudanpur, S. (2015). A time delay neural network architecture for efficient modeling of long temporal contexts. In *Sixteenth Annual Conference of the International Speech Communication Association*. Goodfellow, I., Bengio, Y., & Courville, A. (2016). *Deep learning*. MIT press. |