Témata prací (Výběr práce)Témata prací (Výběr práce)(verze: 368)
Detail práce
   Přihlásit přes CAS
Unsupervised and Semi-Supervised Multilingual Learning for Resource-Poor Languages
Název práce v češtině: Unsupervised and Semi-Supervised Multilingual Learning for Resource-Poor Languages
Název v anglickém jazyce: Unsupervised and Semi-Supervised Multilingual Learning for Resource-Poor Languages
Klíčová slova: přirozený jazyk, strojové učení, morfologie, syntaxe
Klíčová slova anglicky: natural language, machine learning, morphology, syntax
Akademický rok vypsání: 2011/2012
Typ práce: diplomová práce
Jazyk práce: angličtina
Ústav: Ústav formální a aplikované lingvistiky (32-UFAL)
Vedoucí / školitel: RNDr. Daniel Zeman, Ph.D.
Řešitel: skrytý - zadáno a potvrzeno stud. odd.
Datum přihlášení: 23.10.2011
Datum zadání: 25.10.2011
Datum potvrzení stud. oddělením: 11.11.2011
Datum a čas obhajoby: 07.09.2012 09:00
Datum odevzdání elektronické podoby:03.08.2012
Datum odevzdání tištěné podoby:03.08.2012
Datum proběhlé obhajoby: 07.09.2012
Oponenti: doc. Mgr. Barbora Vidová Hladká, Ph.D.
 
 
 
Konzultanti: doc. Ing. Zdeněk Žabokrtský, Ph.D.
Zásady pro vypracování
The goal of the thesis is to explore methods of natural language analysis (e.g. part of speech tagging, morphology and syntax) to languages for which few or no linguistically annotated resources are available. Possible approaches include but are not limited to the following:

1. Unsupervised monolingual methods. Reimplement and test published algorithms for unsupervised learning of linguistic structure (POS tagging, parsing).
2. Multilingual learning: existing resources of resource-rich languages are reused for new languages by porting the structure across aligned parallel corpora.
Both approaches could also be combined, for instance two languages would be first tagged in an unsupervised fashion to get a common set of coarse-grained part-of-speech tags, then a parser would be projected from a resource-rich language using parallel alignment and the common tagset (as in McDonald et al. 2011).

The work should include objective evaluation on at least one language where annotated resources are available for testing purposes. Sample application to one or more resource-poor languages with subjective evaluation and discussion would be a plus.
Seznam odborné literatury
Benjamin Snyder and Regina Barzilay: Unsupervised Multilingual Learning for Morphological Segmentation, ACL 2008.
Benjamin Snyder, Tahira Naseem, Jacob Eisenstein, and Regina Barzilay: Unsupervised Multilingual Learning for POS Tagging, EMNLP 2008.
Tahira Naseem, Benjamin Snyder, Jacob Eisenstein, and Regina Barzilay: Multilingual Part-of-Speech Tagging: Two Unsupervised Approaches, JAIR 36 (2009).
Benjamin Snyder, Tahira Naseem, and Regina Barzilay: Unsupervised Multilingual Grammar Induction, ACL 2009.
Ryan McDonald, Slav Petrov, and Keith Hall: Multi-Source Transfer of Delexicalized Dependency Parsers, EMNLP 2011.
Daniel Zeman, Philip Resnik: Cross-Language Parser Adaptation Between Related Languages, NLPLPL / IJCNLP 2008.
Rebecca Hwa, Philip Resnik, Amy Weinberg, Clara Cabezas, Oskan Kolak: Bootstrapping Parsers via Syntactic Projection across Parallel Texts, NL Engineering 11(03):311-325, 2005.
 
Univerzita Karlova | Informační systém UK