SubjectsSubjects(version: 945)
Course, academic year 2016/2017
   Login via CAS
Algorithms in Speech Recognition - NPFL079
Title: Algoritmy rozpoznávání mluvené řeči
Guaranteed by: Institute of Formal and Applied Linguistics (32-UFAL)
Faculty: Faculty of Mathematics and Physics
Actual: from 2014 to 2017
Semester: summer
E-Credits: 6
Hours per week, examination: summer s.:2/2, C+Ex [HT]
Capacity: unlimited
Min. number of students: unlimited
4EU+: no
Virtual mobility / capacity: no
State of the course: taught
Language: Czech, English
Teaching methods: full-time
Teaching methods: full-time
Guarantor: Mgr. Nino Peterek, Ph.D.
Class: DS, matematická lingvistika
Informatika Mgr. - volitelný
Classification: Informatics > Computer and Formal Linguistics
Annotation -
Last update: T_UFAL (04.05.2017)
The course presents recent methodologies and software toolkits for speech recognition. Students will learn how to develop systems of automatic speech recognition and transcription, computer dialogue systems and speaker identification. The course shows principles, preparation and decoding algorithms of statistical acoustic and language models (HMM, n-gram and structured language models, final state transducers, graphical models, Viterbi dynamic programming, heuristic hypothesis search strategies, stack decoder). This course can be preceded by PFL038 and combined with PFL067, PFL068.
Literature -
Last update: T_UFAL (05.05.2017)
[JEL] F. Jelinek, Statistical Methods for Speech Recognition, MIT Press, 1998

[PSU] J. Psutka, L. Müller, J. Matoušek, V. Radová, Mluvíme s počítačem česky, Academia, 2006

[SPO] X. Huang, A. Acero, H. Hon, Spoken Language Processing, Prentice-Hall, 2001

PFL079 Details and News

Requirements to the exam -
Last update: Mgr. Nino Peterek, Ph.D. (10.06.2019)

For successful completion of course programming of three small projects necessary (speech library functions and a small speech application).

Syllabus -
Last update: Mgr. Nino Peterek, Ph.D. (11.06.2019)

Overview of speech technologies

  • wonders of speech recognition,
  • main applications and their architectures,
  • theories and models overview,
  • software toolkits and libraries,
  • speech processing books and magazines.

Acoustic Modelling (SPO C8-C9 | JEL C2-C3 | PSU C5.3, partially repetition of PFL038)

  • definition and parameters of the hidden Markov model (HMM),
  • evaluation of an HMM (Forward algorithm),
  • training of an HMM (Baum-Welch algorithm),
  • extracting features of speech, scoring acoustic features (MFCC, Gaussians mixtures, parameters clustering),
  • adaptive techniques (MAP, MLLR),
  • confidence measures,
  • software toolkits for speech recognition (HTK Tools, EST).

Language Modelling (PFL067 | JEL C4 | SPO C11 | PSU 5.4)

  • methods of language modelling,
  • n-gram models, smoothing (Good-Turing, Katz), adaptive language models,
  • structured language models (PCFG),
  • specifics of spoken and writen language modelling,
  • transducers and software tools for language modelling (AT&T FSM Library, SRI LM Toolkit).

Basic decoding techniques (SPO C12 | JEL C5-C6 | PSU C6)

  • search algorithms (search space and heuristics, A*),
  • combining acoustic and language models (uni-, bi-, trigrams),
  • time-synchronous search (Viterbi, beam, tree lexicon),
  • state-synchronous search,
  • graphical models (GMTK: The Graphical Models Toolkit).

Large vocabulary search algorithms (SPO C13 | JEL C5-C6 | PSU 6.7.3, 6.7.5, 6.10)

  • efficient manipulation of tree lexicon,
  • N-best and multipass search strategies,
  • AT&T GRM Library, AT&T DCD Library.

Automatic dialogue systems (SPO C17 | PSU C11)

  • characteristics of spontaneous dialogues,
  • prosody and structure of dialogues,
  • semantic representation,
  • dialogue management, emotion detection,
  • VoiceXML.

Speaker identification (PSU C9)

  • identification systems overview,
  • selected speech features for speaker identification,
  • basic methods.

The software tools and libraries will be introduced and trained in the practical part of course.

 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html