Unsupervised and Semi-Supervised Multilingual Learning for Resource-Poor Languages
Název práce v češtině: | Unsupervised and Semi-Supervised Multilingual Learning for Resource-Poor Languages |
---|---|
Název v anglickém jazyce: | Unsupervised and Semi-Supervised Multilingual Learning for Resource-Poor Languages |
Klíčová slova: | přirozený jazyk, strojové učení, morfologie, syntaxe |
Klíčová slova anglicky: | natural language, machine learning, morphology, syntax |
Akademický rok vypsání: | 2011/2012 |
Typ práce: | diplomová práce |
Jazyk práce: | angličtina |
Ústav: | Ústav formální a aplikované lingvistiky (32-UFAL) |
Vedoucí / školitel: | RNDr. Daniel Zeman, Ph.D. |
Řešitel: | skrytý - zadáno a potvrzeno stud. odd. |
Datum přihlášení: | 23.10.2011 |
Datum zadání: | 25.10.2011 |
Datum potvrzení stud. oddělením: | 11.11.2011 |
Datum a čas obhajoby: | 07.09.2012 09:00 |
Datum odevzdání elektronické podoby: | 03.08.2012 |
Datum odevzdání tištěné podoby: | 03.08.2012 |
Datum proběhlé obhajoby: | 07.09.2012 |
Oponenti: | doc. Mgr. Barbora Vidová Hladká, Ph.D. |
Konzultanti: | doc. Ing. Zdeněk Žabokrtský, Ph.D. |
Zásady pro vypracování |
The goal of the thesis is to explore methods of natural language analysis (e.g. part of speech tagging, morphology and syntax) to languages for which few or no linguistically annotated resources are available. Possible approaches include but are not limited to the following:
1. Unsupervised monolingual methods. Reimplement and test published algorithms for unsupervised learning of linguistic structure (POS tagging, parsing). 2. Multilingual learning: existing resources of resource-rich languages are reused for new languages by porting the structure across aligned parallel corpora. Both approaches could also be combined, for instance two languages would be first tagged in an unsupervised fashion to get a common set of coarse-grained part-of-speech tags, then a parser would be projected from a resource-rich language using parallel alignment and the common tagset (as in McDonald et al. 2011). The work should include objective evaluation on at least one language where annotated resources are available for testing purposes. Sample application to one or more resource-poor languages with subjective evaluation and discussion would be a plus. |
Seznam odborné literatury |
Benjamin Snyder and Regina Barzilay: Unsupervised Multilingual Learning for Morphological Segmentation, ACL 2008.
Benjamin Snyder, Tahira Naseem, Jacob Eisenstein, and Regina Barzilay: Unsupervised Multilingual Learning for POS Tagging, EMNLP 2008. Tahira Naseem, Benjamin Snyder, Jacob Eisenstein, and Regina Barzilay: Multilingual Part-of-Speech Tagging: Two Unsupervised Approaches, JAIR 36 (2009). Benjamin Snyder, Tahira Naseem, and Regina Barzilay: Unsupervised Multilingual Grammar Induction, ACL 2009. Ryan McDonald, Slav Petrov, and Keith Hall: Multi-Source Transfer of Delexicalized Dependency Parsers, EMNLP 2011. Daniel Zeman, Philip Resnik: Cross-Language Parser Adaptation Between Related Languages, NLPLPL / IJCNLP 2008. Rebecca Hwa, Philip Resnik, Amy Weinberg, Clara Cabezas, Oskan Kolak: Bootstrapping Parsers via Syntactic Projection across Parallel Texts, NL Engineering 11(03):311-325, 2005. |