Subjects

Your browser does not support JavaScript, or its support is disabled. Some features may not be available.

Fundamentals of Speech Recognition and Generation - NPFL038

Title:	Základy rozpoznávání a generování mluvené řeči
Guaranteed by:	Institute of Formal and Applied Linguistics (32-UFAL)
Faculty:	Faculty of Mathematics and Physics
Actual:	from 2020
Semester:	winter
E-Credits:	5
Hours per week, examination:	winter s.:2/2, C+Ex [HT]
Capacity:	unlimited
Min. number of students:	unlimited
4EU+:	no
Virtual mobility / capacity:	no
State of the course:	taught
Language:	Czech, English
Teaching methods:	full-time
Additional information:	https://ufal.mff.cuni.cz/courses/npfl038

Guarantor:	Mgr. Nino Peterek, Ph.D.
Teacher(s):	Mgr. Nino Peterek, Ph.D.
Class:	DS, matematická lingvistika Informatika Mgr. - Matematická lingvistika
Classification:	Informatics > Computer and Formal Linguistics
Is incompatible with:	NPFX038
Is interchangeable with:	NPFX038

Opinion survey results Examination dates WS schedule Noticeboard

Annotation -

This course deals with speech recognition and generation tasks and feature extraction of voice and utterance characteristics. Of particular interest will be topics related to Hidden Markov Models as applied to speech (FFT, n- dimensional clustering, Gaussian mixtures, parameter value extraction from data, phonetic representation, prosodic analysis etc.) and to their DNN-HMM hybrid models. Preparation and training of own speech recognition and generation models.

Last update: Mírovský Jiří, RNDr., Ph.D. (11.05.2022)

Course completion requirements -

Oral examination and project presentation.

The practical part is controlled through the preparation and presentation of own models for speech recognition and generation.

The presentation is repeatable.

Last update: Peterek Nino, Mgr., Ph.D. (10.06.2019)

Literature -

Gernot A. Fink, Markov Models for Pattern Recognition, Springer, 2014

Steve Young, Dan Kershaw, Julian Odell, Dave Ollason, Valtcho Valtchev, Phil Woodland, The HTK Book, Cambridge, Entropic Ltd. http://htk.eng.cam.ac.uk, 1995-2007

Zdena Palková, Fonetika a fonologie češtiny, Karolinum, Praha, 1997

Dong Yu,Li Deng, Automatic Speech Recognition A Deep Learning Approach, 2015

U. Kamath, J. Liu, J. Whitaker, Deep Learning for NLP and Speech Recognition, Springer, 2019

NPFL038 Details and News

Last update: Peterek Nino, Mgr., Ph.D. (23.05.2025)

Requirements to the exam -

Exam covers theoretical part of the course (syllabus), there is only oral exam.

Finalisation of practical part is not necessary before the exam.

Last update: Peterek Nino, Mgr., Ph.D. (13.10.2017)

Syllabus -

Audio signal processing.

Introduction to Speech Production and Perception.

Vector Quantisation and Gaussian mixtures.

HMM speech models - HMM evaluation, Viterbi decoding.

Viterbi training.

Transducers.

Text To Speech - HMM-based Speech Synthesis.

TTS - DNN-based models.

DNN-HMM hybrid system for speech recognition.

Invitation to advanced speech recognition course NPFL079 with End-To-End DNN models.

Practical part:

HTK Tools description

Data Preparation Tools,

Training Tools,

Recognition Tools,

Analysis Tool.

Data Preparation

the Task Grammar,

the Language Model,

the Dictionary,

Recording the Data, Creating the Transcription Files, Coding the Data.

Creating Monophone HMMs

Creating Flat Start Monophones,

Fixing the Silence Models,

Realigning the Training Data.

Creating Triphones HMMs

Making Triphones from Monophones,

Making Tied-State Triphones,

Splitting States.

Recogniser Evaluation.

Speech Prosody Analysis.

Last update: Peterek Nino, Mgr., Ph.D. (24.05.2025)