SubjectsSubjects(version: 941)
Course, academic year 2022/2023
   Login via CAS
Fundamentals of Speech Recognition and Generation - NPFL038
Title: Základy rozpoznávání a generování mluvené řeči
Guaranteed by: Institute of Formal and Applied Linguistics (32-UFAL)
Faculty: Faculty of Mathematics and Physics
Actual: from 2020
Semester: winter
E-Credits: 5
Hours per week, examination: winter s.:2/2, C+Ex [HT]
Capacity: unlimited
Min. number of students: unlimited
Virtual mobility / capacity: no
State of the course: taught
Language: Czech, English
Teaching methods: full-time
Teaching methods: full-time
Additional information:
Guarantor: Mgr. Nino Peterek, Ph.D.
Class: DS, matematická lingvistika
Informatika Mgr. - Matematická lingvistika
Classification: Informatics > Computer and Formal Linguistics
Is incompatible with: NPFX038
Is interchangeable with: NPFX038
Annotation -
Last update: RNDr. Jiří Mírovský, Ph.D. (11.05.2022)
This course deals with speech recognition and generation tasks and feature extraction of voice and utterance characteristics. Of particular interest will be topics related to Hidden Markov Models as applied to speech (FFT, n- dimensional clustering, Gaussian mixtures, parameter value extraction from data, phonetic representation, prosodic analysis etc.) and to their DNN-HMM hybrid models. Preparation and training of own speech recognition and generation models.
Course completion requirements -
Last update: Mgr. Nino Peterek, Ph.D. (10.06.2019)

Oral examination and project presentation.

The practical part is controlled through the preparation and presentation of own models for speech recognition and generation.

The presentation is repeatable.

Literature -
Last update: Mgr. Nino Peterek, Ph.D. (11.05.2022)
Gernot A. Fink, Markov Models for Pattern Recognition, Springer, 2014

Steve Young, Dan Kershaw, Julian Odell, Dave Ollason, Valtcho Valtchev, Phil Woodland, The HTK Book, Cambridge, Entropic Ltd., 1995-2007

Zdena Palková, Fonetika a fonologie češtiny, Karolinum, Praha, 1997

Dong Yu,Li Deng, Automatic Speech Recognition A Deep Learning Approach, 2015

NPFL038 Details and News

Requirements to the exam -
Last update: Mgr. Nino Peterek, Ph.D. (13.10.2017)

Exam covers theoretical part of the course (syllabus), there is only oral exam.

Finalisation of practical part is not necessary before the exam.

Syllabus -
Last update: Mgr. Nino Peterek, Ph.D. (13.10.2017)

Introduction to Speech Production and Perception.

General Principles of Automatic Speech Recognition (HMM)

  • Isolated Word Recognition,
  • Output Probability Specification,
  • Baum-Welch Re-Estimation,
  • Recognition and Viterbi Decoding,
  • Continuous Speech Recognition,
  • Speaker Adaptation.

HTK Tools description

  • Data Preparation Tools,
  • Training Tools,
  • Recognition Tools,
  • Analysis Tool.

Data Preparation

  • the Task Grammar,
  • the Language Model,
  • the Dictionary,
  • Recording the Data, Creating the Transcription Files, Coding the Data.

Creating Monophone HMMs

  • Creating Flat Start Monophones,
  • Fixing the Silence Models,
  • Realigning the Training Data.

Creating Triphones HMMs

  • Making Triphones from Monophones,
  • Making Tied-State Triphones,
  • Splitting States.

Recogniser Evaluation.

General Principles of Automatic Speech Generation.

Speech Prosody Analysis.

Charles University | Information system of Charles University |