Thesis (Selection of subject)Thesis (Selection of subject)(version: 368)
Thesis details
   Login via CAS
Speech Reconstruction
Thesis title in Czech: Rekonstrukce mluvené řeči
Thesis title in English: Speech Reconstruction
Key words: Automatická editace a korekce textu, transkripce, rozpoznávání řeči, strojové učení
English key words: Automatic editing and text correction, transcription, speech recognition, machine learning
Academic year of topic announcement: 2022/2023
Thesis type: diploma thesis
Thesis language: angličtina
Department: Institute of Formal and Applied Linguistics (32-UFAL)
Supervisor: prof. RNDr. Jan Hajič, Dr.
Author:
Guidelines
Speech Reconstruction is an area of speech processing in which Standard speech is extracted
from spontaneous speech database. Spontaneous speech is quite different, both acoustically and
linguistically, from speech produced from written text in a sense that it contains useless
information in the form of pauses, hesitations, repetitions, partial words and dysfluencies.
Therefore, some robust acoustic and language models are required to handle these discrepancies.
From this perspective, there are two different task involved in speech reconstruction

i) Automatic Speech Recognition
ii) Creation of Standard Speech from Recognized speech as delivered by (i).

The main focus of this thesis will be on task (ii) where there is a need to have some flexible
models that can handle post speech recognition discrepancies. For this task, we have
Spontaneous speech database available for English and Czech Language from different sources
and also its standardized version, manually annotated. The data contains dialogue from daily
routine speech. Therefore it contains lots of discrepancies as defined above. The task here is to
develop a Language and Translation Model, using Deep Learning methods, that can eliminate
these discrepancies from speech and make speech available for further processing by standard
language tools.

This task can be perceived as a Machine Translation task where output from ASR is
considered as information in source language and goal is to convert that information into target
language (standard text).

Technical Aspects:

Using a Deep Learning system of choice, to develop a sequence-to-sequence "translation" system as
defined above. Experimentally test various DNN architectures and experiment with
hyperparameter settings.
References
Manning, Schuetze: Foundations of Statistical NLP. MIT Press, 2000.
PIRE: Investigation of Meaning Representations in Language Understanding for Speech
Reconstruction and Machine Translation Systems: http://www.clsp.jhu.edu/research/pire/
DNN toolkits, e.g. Tensorflow and their documentation.
Deep Learning course(s), such as Milan Straka's NPFL114 (http://ufal.mff.cuni.cz/courses/npfl114/1718-summer,
or online (from 2017/8) at https://slideslive.com/s/milan-straka-10654
Preliminary scope of work
Rekonstrukce mluvené řeči je problém, který řeší konverzi výstupu automatického rozpoznávače řeči do spisovné podoby. Tato úloha má mnoho možných postupů řešení; cílem DP je najít aspoň jeden postup pomocí paradigmatu strojového překladu, který zlepší současnou "baseline" přesnost.
Preliminary scope of work in English
Speech Reconstruction is an area of speech processing in which Standard speech is extracted from spontaneous speech database. Spontaneous speech is quite different, both acoustically and linguistically, from speech produced from written text in a sense that it contains useless information in the form of pauses, hesitations, repetitions, partial words and disfluencies. Therefore, some robust acoustic and language models are required to handle these discrepancies.
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html