Subjects

Your browser does not support JavaScript, or its support is disabled. Some features may not be available.

Morphological and Syntactic Analysis II - NPFL105

Title:	Morfologická a syntaktická analýza II
Guaranteed by:	Institute of Formal and Applied Linguistics (32-UFAL)
Faculty:	Faculty of Mathematics and Physics
Actual:	from 2017
Semester:	summer
E-Credits:	6
Hours per week, examination:	summer s.:0/2, C [HT]
Capacity:	unlimited
Min. number of students:	unlimited
4EU+:	no
Virtual mobility / capacity:	no
State of the course:	cancelled
Language:	Czech, English
Teaching methods:	full-time
Teaching methods:	full-time
Additional information:	https://ufal.mff.cuni.cz/course/npfl105

Guarantor:	RNDr. Daniel Zeman, Ph.D.

Opinion survey results Examination dates Schedule Noticeboard

Annotation -

Last update: T_UFAL (09.05.2012)

This course loosely extends NPFL094 “Morphological and Syntactic Analysis” (passing NPFL094 is not formally required). It will be a team project. The goal is to acquire or create as many resources as possible for one selected natural language. Each participant will be responsible for a part of the activities, ranging from downloading corpora from the web to design of grammatical rules and training of parsers.

Literature -

Last update: T_UFAL (09.05.2012)

• Martin Popel, Zdeněk Žabokrtský: TectoMT: Modular NLP Framework. In Proceedings of IceTAL , 7th International Conference on Natural Language Processing, Reykjavík, Iceland, August 17, 2010, pp. 293-304.

• Antonio M. Corbí-Bellot, Mikel L. Forcada, Sergio Ortiz-Rojas, Juan Antonio Pérez-Ortiz, Gema Sánchez-Ramírez, Felipe Sánchez-Martínez, Iñaki Alegria, Aingeru Mayor, Kepa Sarasola (2005) "An open-source shallow-transfer machine translation engine for the romance languages of Spain ", in Proceedings of the European Associtation for Machine Translation, 10th Annual Conference (Budapest, Hungary, 30-31.05.2005), p. 79-86

• Philip Resnik, Noah A. Smith, The Web as a parallel corpus Computational Linguistics, Volume 29 , Issue 3 (September 2003), Pages: 349 - 380.

• Rayid Ghani, Rosie Jones, Dunja Mladenic: "Building Minority Language Corpora by Learning to Generate Web Search Queries"

KAIS Knowledge and Information Systems, volume 7, number 1, 2005

Syllabus -

Last update: T_UFAL (09.05.2012)

Model scenario 1:

We want to construct a machine-translation system from/to a new language. First of all we need a parallel corpus of the language and English (or Czech or something else that is available). Further on, we will be interested in tools for morphological and syntactic analysis, named entity recognition etc. in order to improve translation quality.

Model scenario 2:

We have little or no parallel data, thus we will focus on rule-based systems. We will propose a set of morphological tags, create a simple morphological and syntactic analyzer, if possible also bilingual lexicon. We will try to put all these pieces together in a rule-based translation system (Treex, Apertium) and use it for primitive translation.