SubjectsSubjects(version: 845)
Course, academic year 2018/2019
   Login via CAS
Fundamentals of Speech Recognition and Generation - NPFL038
Title in English: Základy rozpoznávání a generování mluvené řeči
Guaranteed by: Institute of Formal and Applied Linguistics (32-UFAL)
Faculty: Faculty of Mathematics and Physics
Actual: from 2015 to 2018
Semester: winter
E-Credits: 6
Hours per week, examination: winter s.:2/2 C+Ex [hours/week]
Capacity: unlimited
Min. number of students: unlimited
State of the course: taught
Language: Czech, English
Teaching methods: full-time
Guarantor: Mgr. Nino Peterek, Ph.D.
Class: DS, matematická lingvistika
Informatika Mgr. - Matematická lingvistika
Classification: Informatics > Computer and Formal Linguistics
Annotation -
Last update: T_UFAL (11.05.2012)
This course deals with speech recognition and generation tasks and feature extraction of voice and utterance characteristics. Of particular interest will be topics related to Hidden Markov Models as applied to speech (FFT, n- dimensional clustering, Gaussian mixtures, parameter value extraction from data, phonetic representation, prosodic analysis etc.). Preparation and training of own speech recognition models.
Course completion requirements -
Last update: Mgr. Nino Peterek, Ph.D. (10.06.2019)

Oral examination and project presentation.

The practical part is controlled through the preparation and presentation of own models for speech recognition and generation.

The presentation is repeatable.

Literature -
Last update: Mgr. Nino Peterek, Ph.D. (13.10.2017)
Gernot A. Fink, Markov Models for Pattern Recognition, Springer, 2014

Steve Young, Dan Kershaw, Julian Odell, Dave Ollason, Valtcho Valtchev, Phil Woodland, The HTK Book, Cambridge, Entropic Ltd. http://htk.eng.cam.ac.uk, 1995-2007

Zdena Palková, Fonetika a fonologie češtiny, Karolinum, Praha, 1997

NPFL038 Details and News

Requirements to the exam -
Last update: Mgr. Nino Peterek, Ph.D. (13.10.2017)

Exam covers theoretical part of the course (syllabus), there is only oral exam.

Finalisation of practical part is not necessary before the exam.

Syllabus -
Last update: Mgr. Nino Peterek, Ph.D. (13.10.2017)

Introduction to Speech Production and Perception.

General Principles of Automatic Speech Recognition (HMM)

  • Isolated Word Recognition,
  • Output Probability Specification,
  • Baum-Welch Re-Estimation,
  • Recognition and Viterbi Decoding,
  • Continuous Speech Recognition,
  • Speaker Adaptation.

HTK Tools description

  • Data Preparation Tools,
  • Training Tools,
  • Recognition Tools,
  • Analysis Tool.

Data Preparation

  • the Task Grammar,
  • the Language Model,
  • the Dictionary,
  • Recording the Data, Creating the Transcription Files, Coding the Data.

Creating Monophone HMMs

  • Creating Flat Start Monophones,
  • Fixing the Silence Models,
  • Realigning the Training Data.

Creating Triphones HMMs

  • Making Triphones from Monophones,
  • Making Tied-State Triphones,
  • Splitting States.

Recogniser Evaluation.

General Principles of Automatic Speech Generation.

Speech Prosody Analysis.

 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html