Subjects

Your browser does not support JavaScript, or its support is disabled. Some features may not be available.

Introduction to Computer Linguistics - NPFL012

Title:	Úvod do počítačové lingvistiky
Guaranteed by:	Institute of Formal and Applied Linguistics (32-UFAL)
Faculty:	Faculty of Mathematics and Physics
Actual:	from 2014
Semester:	winter
E-Credits:	3
Hours per week, examination:	winter s.:2/0, Ex [HT]
Capacity:	unlimited
Min. number of students:	unlimited
4EU+:	no
Virtual mobility / capacity:	no
State of the course:	taught
Language:	Czech, English
Teaching methods:	full-time

Guarantor:	doc. RNDr. Vladislav Kuboň, Ph.D.
Teacher(s):	doc. RNDr. Vladislav Kuboň, Ph.D.
Class:	Informatika Bc. Informatika Mgr. - Matematická lingvistika
Classification:	Informatics > Computer and Formal Linguistics
Is pre-requisite for:	NPFL030, NPFL028

Opinion survey results WS schedule Noticeboard

Annotation -

The main goal of this course is to provide an overview of individual subfields of computational linguistics. Main issues being solved by these subfields are also mentioned. Among the subfields the course stresses are machine translation, syntactic parsing, morphology and corpus linguistics.

Last update: T_UFAL (10.05.2001)

Course completion requirements -

The course finishes with a written exam. It contains 8-10 questions which concern the topics covered in lectures. One of the questions requires a detailed description of one of the algorithms described in the course of the semester.

Last update: Kuboň Vladislav, doc. RNDr., Ph.D. (15.10.2017)

Literature -

R. Grishman. Computational Linguistics: An Introduction. ACL Studies in Nantural Language Processing. Cambridge University Press, 1986.

KIRSCHNER. Z. (1983). MOSAIC - A Method of Automatic Extraction of Significant Terms from Texts. Praha : MFF UK, 1983. 124 s.

Králíková, K., Panevová, J. (1990). "ASIMUT - A Method for Automatic Information Retrieval from Full Texts", Explizite Beschreibung der Sprache und automatische Textbearbeitung XVII, Faculty of Mathematics and Physics, Charles University, Prague.

Jan Hajič, Alena Böhmová, Eva Hajičová, Barbora Vidová Hladká: "The Prague Dependency Treebank: A Three-Level Annotation Scenario". In: A. Abeillé (ed.): Treebanks: Building and Using Parsed Corpora, Amsterdam:Kluwer, 2000, pp. 103-127

Daniel Juraffsky, James H. Martin: Speech and Language processing (draft of the 3rd edition available at https://web.stanford.edu/~jurafsky/slp3/) 2023

Last update: Kuboň Vladislav, doc. RNDr., Ph.D. (15.05.2023)

Syllabus -

1. Introduction, overview of subfields of computational linguistics

2. Natural language, its functions and structure

3. Language processing at character level, morphological analysis and tagging, spellchecking methods

4. Basic principles of Hidden Markov Models in the morphology of natural languages

5. MOSAIC - document search system exploiting natural language properties

6. Language syntax and its formal representation in data structures - component and dependency trees

7. Non-projective constructions in Czech

8. Overview of formal theories of syntax - Transformational grammar, LFG, TAG, unification formalisms, Functional generative description

9. Tools for syntactic analysis of languages - Q-systems, Augmented Transition Networks

10. Grammar checking methods

11. Corpus linguistics

12. History and methods of machine translation

13. Introduction to semantics - lexical semantics, representation of the meaning of sentences.

14. Anaphoric relationships in a sentence.

15. Sentiment analysis

Last update: Kuboň Vladislav, doc. RNDr., Ph.D. (15.05.2023)