SubjectsSubjects(version: 945)
Course, academic year 2023/2024
   Login via CAS
NLP Applications - NPFL093
Title: Aplikace NLP
Guaranteed by: Institute of Formal and Applied Linguistics (32-UFAL)
Faculty: Faculty of Mathematics and Physics
Actual: from 2020
Semester: summer
E-Credits: 4
Hours per week, examination: summer s.:2/1, MC [HT]
Capacity: unlimited
Min. number of students: unlimited
4EU+: no
Virtual mobility / capacity: no
State of the course: taught
Language: Czech, English
Teaching methods: full-time
Teaching methods: full-time
Guarantor: doc. RNDr. Vladislav Kuboň, Ph.D.
Class: Informatika Mgr. - Matematická lingvistika
Classification: Informatics > Computer and Formal Linguistics
Is incompatible with: NPFX093
Is interchangeable with: NPFX093
Annotation -
Last update: T_UFAL (10.05.2010)
The main goal of the course is to introduce basic types of natural language processing (NLP) applications and to give the students a chance to work with some of those applications in seminars. The course will concern machine translation, machine aided human translation tools, localization tools, information retrieval and extraction, question answering, speech recognition, spelling and grammar checking, generation etc.
Course completion requirements
Last update: doc. RNDr. Vladislav Kuboň, Ph.D. (22.04.2020)

The course requires a continuous work of students in the form of reports describing topics of individual lectures. The reports are required even if a student is not present on the lecture, in such a case (s)he submits a general report on the topic of the missed lecture. A participation is strongly recommended. After submitting all reports, the student obtains a grade based upon the quality of those reports.

Literature -
Last update: T_UFAL (10.05.2010)

Handbook of NLP, ed. N.Indurkhya, F.Damerau, CRC Press, 2010.

Foundations of Statistical Natural Language Processing, C. Manning and H. Schütze, MIT Press, 1999.

Syllabus -
Last update: T_UFAL (10.05.2010)

1. Introduction - an overview of basic application components.

2. Spelling checker

Dictionary based methods vs. checking of illegal combinations of characters, string similarity metrics, communication towards the user.

3. Grammar checking

Error patterns vs. syntactic analysis, types of detectable errors, attitude towards the user, RFODG and LanGR.

4. Machine Assisted human translation

Translation memory and its variants in commercial products, controlled language, glossary hierarchies.

5. Machine Translation

Google Translate vs. rule-based systems commercial systems (Systran, PC Translator), quality evaluation methods, evaluation of translation competitions, project Euromatrix.

6. Localization

Differences between translation and localization, commercial localization tools.

7. Generating

Text generation from tectogrammatical layer.

8. Information retrieval and extraction

Basic models, evaluation metrics, text similarity metrics, lemmatization, stop words, the role of linguistic tools, Malach project.

9. Question answering
Dialog systems, multimodal communication.

10. Speech synthesis and recognition

Basic problems and algorithms.

11. Semantic web

Exploitation of linguistic methods for searching for information on the web, the role of the tectogrammatical layer.

 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html