SubjectsSubjects(version: 850)
Course, academic year 2019/2020
   Login via CAS
Morphological and Syntactic Analysis - NPFL094
Title in English: Morfologická a syntaktická analýza
Guaranteed by: Institute of Formal and Applied Linguistics (32-UFAL)
Faculty: Faculty of Mathematics and Physics
Actual: from 2019
Semester: winter
E-Credits: 3
Hours per week, examination: winter s.:2/0 MC [hours/week]
Capacity: unlimited
Min. number of students: unlimited
State of the course: taught
Language: English, Czech
Teaching methods: full-time
Additional information: http://ufal.mff.cuni.cz/course/npfl094
Guarantor: RNDr. Daniel Zeman, Ph.D.
Class: Informatika Mgr. - Matematická lingvistika
Classification: Informatics > Computer and Formal Linguistics
Annotation -
Last update: T_UFAL (28.04.2015)
Basic methods and algorithms used for morphemic segmentation, morphological and syntactic (constituency-based, dependency-based, tectogrammatical) analysis of natural languages. We will try out some of the approaches on an unknown language, as student mini-projects during the semester. Credits will be awarded for contribution to these mini-projects. There is a follow-up course in the summer semester NPFL105 Morphological and Syntactic Analysis II, where we look at analysis of one selected language in more detail.
Course completion requirements -
Last update: RNDr. Daniel Zeman, Ph.D. (07.10.2017)

The credits are awarded for homeworks assigned during the semester. A typical homework consists of natural language processing whereas the solution comprises both the processed data and the tools created or configured by the student in order to process the data. Solutions are submitted by e-mail. Each homework task has its own number of points that can be awarded for the solution. Each task has a deadline. It is possible to submit the solution after the deadline but late submissions will not get the full points. In any case the solutions must be submitted before the end of the winter exam period, unless an exception has been negotiated with the lecturer.

If the full points were not awarded (also) for other reasons than late submission, the student can submit an improved solution where the lecturer's comments have been addressed. The new submission will be evaluated as if it was the first submission submitted after deadline.

The credit is graded and the final grade corresponds to the total number of points awarded for homeworks. There will be at least three homework assignments and the point system will enable getting the grade 1 (“outstanding”) for full points in two assignments.

The student can negotiate with the lecturer an alternative way of completion of the course, e.g. by doing a larger semestral project instead of several smaller assignments.

Literature -
Last update: Mgr. Barbora Vidová Hladká, Ph.D. (29.01.2019)
  • James Allen: Natural Language Understanding. The Benjamin/Cummings Publishing Company, Inc.; Redwood City, California,1994. ISBN 0-8053-0334-0.
  • Adolf Erhart: Základy jazykovědy. Státní pedagogické nakladatelství; Praha, 1990
  • Kimmo Koskenniemi: Two-level Morphology: A General Computational Model for Word-form Recognition and Production. University of Helsinki, Department of General Linguistics, Publications No. 11; Helsinki, 1983
  • Kenneth R. Beesley, Lauri Karttunen: Finite State Morphology. CSLI Publications, 2003
  • Jan Hajič: Unification Morfology Grammar (doktorandská práce). Univerzita Karlova, Praha, 1994
  • Richard Sproat: Morphology and Computation. Massachusetts Institute of Technology, Cambridge, Massachusetts, 1992
  • Stuart Shieber: An Introduction to Unification-based Approaches to Grammar. CSLI Lecture Notes No. 4, Stanford, California, 1986
  • Kenneth R. Beesley, Lauri Karttunen: Finite State Morphology. CSLI Publications, 2003
  • Zeman, Daniel: The World of Tokens, Tags and Trees. Studies in Computational and Theoretical Linguistics, vol. 19. ÚFAL, Praha, 2018, ISBN 978-80-88132-09-7.

Syllabus -
Last update: T_UFAL (10.05.2010)

1. Sets of morphosyntactic tags, definition of problems, chunking, constituency and dependency trees.

2. Supervised and unsupervised morphemic segmentation.

3. Two-level morphology.

4. Context-free grammars and chart parser, usage for morphological analysis.

5. Unification grammars for morphological analysis.

6. Morphological disambiguation (tagging).

7. Syntactic analysis (parsing) and regular expressions.

8. Probabilistic context-free grammars and chart parser for syntactic analysis. Collins and Charniak parser.

9. Dependency parsing, nonprojectivities, MST parser, Malt parser.

10. Parser combination.

 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html