Thesis (Selection of subject)

Your browser does not support JavaScript, or its support is disabled. Some features may not be available.

Metody doménové adaptace pro rozpoznávání řeči

Thesis title in Czech:	Metody doménové adaptace pro rozpoznávání řeči
Thesis title in English:	Methods of Domain Adaptation for Speech Recognition
Academic year of topic announcement:	2019/2020
Thesis type:	diploma thesis
Thesis language:	čeština
Department:	Institute of Formal and Applied Linguistics (32-UFAL)
Supervisor:	doc. RNDr. Ondřej Bojar, Ph.D.
Author:	hidden - assigned and confirmed by the Study Dept.
Date of registration:	20.02.2020
Date of assignment:	16.03.2020
Confirmed by Study dept. on:	01.06.2020
Date and time of defence:	08.07.2020 09:00
Date of electronic submission:	28.05.2020
Date of submission of printed version:	28.05.2020
Date of proceeded defence:	08.07.2020
Opponents:	Mgr. et Mgr. Ondřej Dušek, Ph.D.

Guidelines

The quality of automatic speech recognition (ASR) critically depends on the match of the test and training data. Domain adaptation techniques are used to adjust a more general system to improve its performance for a particular situation.

The goal of the thesis is to explore method of domain adaptation for speech recognition. The thesis should consider adaptation at various levels, starting with adaptation to a given subject area (e.g. economics vs. computational linguistics) up to adaptation to individual talks given by a known speaker on a known topic.

An inherent part of the thesis is the empirical evaluation of the discussed or proposed methods. Specifically, the work should start with creating a baseline ASR system for spoken Czech and then carry out a series of domain adaptation experiments at various levels. The quality of the system will be evaluated automatically using the standard WER (word error rate) measure.

References

Mohri, M., Pereira, F., & Riley, M. (2008). Speech recognition with weighted finite-state transducers.
In *Springer Handbook of Speech Processing* (pp. 559-584). Springer, Berlin, Heidelberg.

Young, S. et al. (2006). The HTK book. *Cambridge university engineering department*, *3*, 75.

Goodman, J. (2001). A bit of progress in language modeling. *arXiv preprint cs/0108005*.

Peddinti, V., Povey, D., & Khudanpur, S. (2015). A time delay neural network architecture for efficient modeling of long temporal contexts.
In *Sixteenth Annual Conference of the International Speech Communication Association*.

Goodfellow, I., Bengio, Y., & Courville, A. (2016). *Deep learning*.
MIT press.