Speech Reconstruction
Thesis title in Czech: | Rekonstrukce mluvené řeči |
---|---|
Thesis title in English: | Speech Reconstruction |
Key words: | Automatická editace a korekce textu, transkripce, rozpoznávání řeči, strojové učení |
English key words: | Automatic editing and text correction, transcription, speech recognition, machine learning |
Academic year of topic announcement: | 2022/2023 |
Thesis type: | diploma thesis |
Thesis language: | angličtina |
Department: | Institute of Formal and Applied Linguistics (32-UFAL) |
Supervisor: | prof. RNDr. Jan Hajič, Dr. |
Author: |
Guidelines |
Speech Reconstruction is an area of speech processing in which Standard speech is extracted
from spontaneous speech database. Spontaneous speech is quite different, both acoustically and linguistically, from speech produced from written text in a sense that it contains useless information in the form of pauses, hesitations, repetitions, partial words and dysfluencies. Therefore, some robust acoustic and language models are required to handle these discrepancies. From this perspective, there are two different task involved in speech reconstruction i) Automatic Speech Recognition ii) Creation of Standard Speech from Recognized speech as delivered by (i). The main focus of this thesis will be on task (ii) where there is a need to have some flexible models that can handle post speech recognition discrepancies. For this task, we have Spontaneous speech database available for English and Czech Language from different sources and also its standardized version, manually annotated. The data contains dialogue from daily routine speech. Therefore it contains lots of discrepancies as defined above. The task here is to develop a Language and Translation Model, using Deep Learning methods, that can eliminate these discrepancies from speech and make speech available for further processing by standard language tools. This task can be perceived as a Machine Translation task where output from ASR is considered as information in source language and goal is to convert that information into target language (standard text). Technical Aspects: Using a Deep Learning system of choice, to develop a sequence-to-sequence "translation" system as defined above. Experimentally test various DNN architectures and experiment with hyperparameter settings. |
References |
Manning, Schuetze: Foundations of Statistical NLP. MIT Press, 2000.
PIRE: Investigation of Meaning Representations in Language Understanding for Speech Reconstruction and Machine Translation Systems: http://www.clsp.jhu.edu/research/pire/ DNN toolkits, e.g. Tensorflow and their documentation. Deep Learning course(s), such as Milan Straka's NPFL114 (http://ufal.mff.cuni.cz/courses/npfl114/1718-summer, or online (from 2017/8) at https://slideslive.com/s/milan-straka-10654 |
Preliminary scope of work |
Rekonstrukce mluvené řeči je problém, který řeší konverzi výstupu automatického rozpoznávače řeči do spisovné podoby. Tato úloha má mnoho možných postupů řešení; cílem DP je najít aspoň jeden postup pomocí paradigmatu strojového překladu, který zlepší současnou "baseline" přesnost. |
Preliminary scope of work in English |
Speech Reconstruction is an area of speech processing in which Standard speech is extracted from spontaneous speech database. Spontaneous speech is quite different, both acoustically and linguistically, from speech produced from written text in a sense that it contains useless information in the form of pauses, hesitations, repetitions, partial words and disfluencies. Therefore, some robust acoustic and language models are required to handle these discrepancies. |